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BRIEF ON APPEAL 



Sir: 

Further to the Notice of Appeal filed June 11, 2003, and received by the USPTO on June 
13, 2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees include the 
statutory fee of $420 for a two month extension of time, as well as the $ 330.00 fee for the filing 
of this Brief. 

This is an appeal from the decision of the Examiner finally rejecting claims 3, 5-6, 8, 11- 
12, 14-17 and 20-21 of the above-identified application. 



( 1 ^ REAL PARTY IN INTEREST 
The above-identified application is assigned of record to Incyte Pharmaceuticals, Inc. 
(now Incyte Corporation, formerly known as Incyte Genomics, Inc. (Reel 010232, Frame 0779), 

which is the real party in interest herein. 
10/20/2003 HAHHED1 00000066 090108 09828423 

01 FO.1402 330.00 Dfl 
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(2) RELATED APPEALS AND INTERFERENCES 
Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 



Claims rejected: 
Claims allowed: 
Claims canceled: 



(3) STATUS OF THE CLAIMS 
Claims 3, 5-6, 8, 11-12, 14-17 and 20-21 
(none) 

Claims 1 and 2 



Claims withdrawn: Claims 4, 7, 9, 10, 13, 18 and 19 

Claims on Appeal: Claims 3, 5-6, 8, 1 1-12, 14-17 and 20-21 (A copy of the claims on 

appeal, as amended, can be found in the attached Appendix). 



(4) STATUS OF AMENDMENTS AFTER FINAL 
There were no amendments made after final. 



(5) SUMMARY OF THE INVENTION 
Embodiments of Appellants' invention are directed to an isolated antibody which 
specifically binds to a polypeptide, human growth-associated protease inhibitor heavy chain 
precursor, abbreviated as "GAPIP". Appellants' invention includes antibodies which specifically 
bind growth-associated protease inhibitor heavy chain precursors selected from among 
polypeptides comprising the amino acid sequence of SEQ ID NO:l (See the Specification, e.g., at 
page 3, linesl-10 and the Sequence Listing), polypeptides comprising a naturally occurring 
amino acid sequence at least 90% identical to the amino acid sequence of SEQ ID NO:l, said 
naturally-occurring amino acid sequence encoding a polypeptide having protease inhibitor 
activity (See the Specification e.g., at page 16, lines 16-19), and an immunogenic fragment of a 
polypeptide comprising the amino acid sequence of SEQ ID NO:l (See, e.g., at page 8, lines 1-5 
and page 45, lines 7-8). 

As described in the Specification at page 14, line 18 to page 15 line 11: 
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In one embodiment, the invention encompasses a polypeptide comprising the amino acid 
sequence of SEQ ID NO:l, as shown in Figures 1 A, IB, 1C, ID, IE, IF, 1G, 1H, II, and 
1 J. GAPIP is 942 amino acids in length and has eight potential N-glycosylation sites at 
residues N97, N127, N231, N421, N508, N776, N795, and N862; twelve potential casein 
kinase II phosphorylation sites at residues S17, S28, T112, T129, S158, S269, S354, 
T410, T581, S592, T676, and S754; two potential glycosaminoglycan attachment sites at 
residues S213 and S391; seventeen potential protein kinase C phosphorylation sites at 
residues S55, S70, T112, S175, S182, S213, S337, S354, T416, T458, S535, S559, T581, 
S611, S620, S651, and T880; one potential tyrosine kinase phosphorylation site at residue 
Y919; a potential signal peptide sequence from Ml to C14; and a vWFA3 domain, which 
contains the potential metal-binding site glycine-amino acid-serine-amino acid-serine, 
from N295 to N440. As shown in Figures 2A, 2B, 2C, 2D, 2E, 2F, and 2G, GAPIP has 
chemical and structural similarity with human pre-inter-a-trypsin inhibitor (GI 33985; 
SEQ ID NO:3), human pre-inter-a-trypsin inhibitor heavy chain HI (GI 33989; SEQ ID 
NO:4), and human pre-inter-a-trypsin inhibitor heavy chain H3 (GI 288563; SEQ ID 
NO:5). In particular, GAPIP and human pre-inter-a-trypsin inhibitor share 28% identity, 
one potential N-glycosylation site, four potential casein kinase II phosphorylation sites, 
four potential protein kinase C phosphorylation sites, the potential signal peptide 
sequence, and the vWFA3 potential metal-binding site glycine-amino acid-serine-amino 
acid-serine. In addition, GAPIP and human pre-inter-a-trypsin inhibitor heavy chains HI 
and H3 share 27% and 23 % identity, respectively, one potential N-glycosylation site, 
four potential casein kinase II phosphorylation sites, five potential protein kinase C 
phosphorylation sites, the potential signal peptide sequence, and the vWFA3 potential 
metal-binding site glycine-amino acid-serine-amino acid-serine. As illustrated by Figure 
3, GAPIP and human pre-inter-a-trypsin inhibitor heavy chains share a common 
phylogenic heritage. A fragment of SEQ ID NO:2 from about nucleotide 982 to about 
nucleotide 1011 is useful, for example, for designing oligonucleotides or as a 
hybridization probe. Northern analysis shows the expression of this sequence in various 
libraries, at least 63% of which are immortalized or cancerous and at least 26% of which 
involve immune response. Of particular note is the expression of GAPIP in reproductive, 
gastrointestinal, nervous, and fetal tissues. 

The polypeptides and antibodies of the present invention have a variety of utilities. In 
particular, they can be used in expression profiling, for toxicology testing, for drug discovery, 
and for the diagnosis, prevention, and treatment of reproductive, developmental, neoplastic, and 
immunological disorders. (See the Specification at, e.g., at page 14, lines 5-9; page 32, lines 19- 
30 and page 38, lines 6-22 ) 

(6) ISSUES 

1. Whether claims 3, 5-6, 8, 1 1-12, 14-17 and 20-21 meet the utility requirement of 
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35 U.S.C. § 101. 

2. Whether one of ordinary skill in the art would know how to use the claimed 
antibodies, e.g., in toxicology testing, drug development, and to the diagnosis of 
disease, so as to satisfy the enablement requirement of 35 U.S.C. § 112, first 
paragraph. 

3. Whether claims 3, 5-6, 8, 11-12, 14-17 and 20-21 meet the enablement 
requirement of 35 U.S.C. § 112, first paragraph with respect to whether one of 
ordinary skill in the art would be able to make and use antibodies which 
specifically bind polypeptides comprising naturally occurring amino acid 
sequences that are at least 90% identical to the amino acid sequence of SEQ ID 
NO:l. 

4. Whether claims 3, 5-6, 8, 11-12, 14-17 and 20-21 meet the written description 
requirement of 35 U.S.C. § 112, first paragraph. 

(7) GROUPING OF THE CLAIMS 

As to Issue 1 

All of the claims on appeal are grouped together. 
As to Issue 2 

All of the claims on appeal are grouped together. 
As to Issue 3 

Claims 3, 5, 6 ,8, 11, 12 and 14-17 should be considered separately from claims 20 and 
21. 

As to Issue 4 

Claims 3, 5, 6 ,8, 11, 12 and 14-17 should be considered separately from claims 20 and 
21. 
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(S) APPELLANTS' ARGUMENTS 
Issue 1 - Utility rejection Under 35 U.S.C. § 101 

Claims 3, 5-6, 8, 11-12, 14-17 and 20-21 stand rejected under 35 U.S.C. §§ 101 and 112, 
first paragraph, based on the allegation that the claimed invention lacks patentable utility. The 
rejection alleges in particular that "the claimed invention is not supported by either a specific and 
substantial asserted utility, a credible asserted utility or a well-established utility." (3/22/03 
Office Action, at page 2). 

The rejection of claims 3, 5-6, 8, 11-12, 14-17 and 20-21 is improper, as the inventions of 
those claims have a patentable utility as set forth in the instant specification, and/or a utility well- 
known to one of ordinary skill in the art. 

The invention at issue is identified in the patent application as an antibody that 
specifically binds to growth-associated protease inhibitor heavy chain precursor (GAPIP), which 
is a polypeptide encoded by a gene that is expressed in reproductive, gastrointestinal, nervous, 
and fetal tissues (see the Specification, e.g., at page 16, lines 16-22 and page 28, lines 12-14). 
The novel polypeptide GAPIP to which the claimed antibody specifically binds is demonstrated 
in the Specification to be a member of the protease inhibitor family (see the Specification at 
pages 14-15), whose biological functions include regulation of the activity and effect of proteases 
and control pathogenesis of proteolytic disorders, and in treatment of HTV (see the Specification 
at page 2, lines 21-23). As such, the claimed invention has numerous practical, beneficial uses in 
toxicology testing, drug development, and the diagnosis of disease, none of which require 
knowledge of how the polypeptide actually functions. As a result of the benefits of these uses, 
the claimed invention already enjoys significant commercial success. 

The fact that the polypeptide to which the claimed antibody specifically binds is a 
member of the protease inhibitor family alone demonstrates utility beyond the reasonable 
probability required by law. Each of the members of this class, regardless of their particular 
functions, are useful. There is no evidence that any member of this class of polypeptides, let 
alone a substantial number of them, would not have some patentable utility. It follows that there 
is a more than substantial likelihood that the claimed antibody and the SEQ ID NO:l polypeptide 
to which the claimed antibody specifically binds also have patentable utility, regardless of their 
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actual function. The law has never required a patentee to prove more. 

There is, in addition, direct proof of the utility of the claimed invention. The Declaration 
of Lars Michael Furness (previously submitted with the Office Response of 11/18/02) describes 
some of the practical uses of the claimed invention in gene and protein expression monitoring 
applications as they would have been understood at the time of the patent application. The 
Furness Declaration describes, in particular, how the claimed antibody and the SEQ ID NO:l 
polypeptide to which the claimed antibody specifically binds can be used in protein expression 
analysis techniques such as 2-D PAGE gels and western blots. Using the claimed invention with 
these techniques, persons of ordinary skill in the art can better assess, for example, the potential 
toxic affect of a drug candidate (Furness Declaration at f 10). 

The Patent Examiner does not dispute that the claimed antibody and the SEQ ID NO:l 
polypeptide to which the claimed antibody specifically binds can be used in 2-D PAGE gels and 
western blots to perform drug toxicity testing. Instead, the Patent Examiner contends that the 
SEQ ID NO:l polypeptide to which the claimed antibody specifically binds cannot be useful 
without precise knowledge of its function. But the law never has required knowledge of 
biological function to prove utility. It is the claimed invention's uses, not its functions, that are 
the subject of a proper analysis under the utility requirement. 

As demonstrated by the Furness Declaration, the person of ordinary skill in the art can 
achieve beneficial results from the claimed antibody and the SEQ ID NO:l polypeptide to which 
the claimed antibody specifically binds in the absence of any knowledge as to the precise 
function of the protein. The uses of the claimed antibody and the SEQ ID NO:l polypeptide to 
which the claimed antibody specifically binds for gene expression monitoring applications 
including toxicology testing are in fact independent of its precise function. 

L The Applicable Legal Standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent 
applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 
480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 
public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 
recent Court of Appeals for the Federal Circuit case, this threshold is not high: 
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An invention is "useful" under section 101 if it is capable of providing some 
identifiable benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] 
(1966); Brooktree Corp, v. Advanced Micro Devices, Inc., 977 R2d 1555, 1571 
[24 USPQ2d 1401] (Fed. Cir. 1992) ("to violate Section 101 the claimed device 
must be totally incapable of achieving a useful result"); Fuller v. Berger, 120 F. 
274, 275 (7th Cir. 1903) (test for utility is whether invention "is incapable of 
serving any beneficial end"). 

Juicy Whip Inc. v. Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 
demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 
20 USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 
explained: 

An invention need not be the best or only way to accomplish a certain result, and 
it need only be useful to some extent and in certain applications: "[T]he fact that 
an invention has only limited utility and is only operable in certain applications is 
not grounds for finding lack of utility." Envirotech Corp. v. Al George, Inc., 730 
F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utility amounts to 
a "nebulous expression" such as "biological activity" or "biological properties" that does not 
convey meaningful information about the utility of what is being claimed. Cross v. Iizuka, 
753 F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" 
utility. Nelson v. Bowler, 626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" 
utility for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examination Procedure at 
§ 706.03(a). Only if there is no "well-established" utility for the claimed invention must the 
applicant demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
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to possess it. See In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utility could be achieved by the claimed invention. Id. To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re hanger, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If, and only if, the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would convince the 
person of ordinary skill that there is sufficient proof of utility. Brana, 51 F.3d at 1566. The 
applicant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

IL Uses of the claimed antibodies for the diagnosis of conditions or disorders 
characterized by expression of GAPIP, for toxicology testing, and for drug discovery are 
sufficient utilities under 35 U.S.C. §§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible 
utility under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
invention disclosed in the patent application's specification. These uses are explained, in detail, 
in the Furness Declaration accompanying this response. Objective evidence further corroborates 
the credibility of the asserted utilities. 

A. The Specification discloses that the SEQ ID NO:l is GAPIP and therefore 
there is an asserted utility for the claimed antibodies 

The Examiner alleges that nowhere is it disclosed in the instant specification that SEQ ID 
NO:l is GAPIP and, therefore, technically there is no asserted utility for the claimed antibodies 
(See 6/17/02 Office Action, at page 3). Such, however, is not the case. For example, in the 
Brief Description of the Figures, it is explicitly stated that "Figures 1A, IB, 1C, ID, IE, IF, 1G, 
1H, II, and 1J show the amino acid sequence (SEQ ID NO:l) and nucleic acid sequence (SEQ ID 
NO:2) of GAPIP." See the Specification at page 5, lines 2-3. Similar statements identifying 
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SEQ ID NO: 1 as an amino acid sequence of GAPIP can be found throughout the Brief 
Description of the Figures at page 5, lines 5-11. 

B. The similarity of the SEQ ID NO:l polypeptide to which the claimed 

antibody specifically binds to another of undisputed utility demonstrates 
utility 

Because there is a substantial likelihood that the claimed GAPIP is functionally related to 
human pre-inter-a-trypsin inhibitor, human pre-inter-a-trypsin inhibitor heavy chain HI, and pre- 
inter-a-trypsin inhibitor heavy chain H3, polypeptides of undisputed utility, there is by 
implication a substantial likelihood that the claimed polypeptide is similarly useful. Appellants 
need not show any more to demonstrate utility. See In re Brana, 51 F.3d at 1567. 

It is undisputed, and readily apparent from the patent application, that the SEQ ID NO:l 
polypeptide to which the claimed antibody specifically binds shares more than 40 % sequence 
identity over 70 amino acid residues with human pre-inter-a-trypsin inhibitor, human pre-inter-a- 
trypsin inhibitor heavy chain HI, and pre-inter-a-trypsin inhibitor heavy chain H3. For example, 
over the 70 amino acid residues from G271 to 1340 of SEQ ID NO:l, human pre-inter-a-trypsin 
inhibitor, human pre-inter-a-trypsin inhibitor heavy chain HI, and pre-inter-a-trypsin inhibitor 
heavy chain H3 are 55%, 48% and 54% identical, respectively. This is more than enough 
homology to demonstrate a reasonable probability that the utility of human pre-inter-a-trypsin 
inhibitor, human pre-inter-a-trypsin inhibitor heavy chain HI, and pre-inter-a-trypsin inhibitor 
heavy chain H3 can be imputed to the claimed invention. It is well-known that the probability 
that two unrelated polypeptides share more than 40% sequence homology over 70 amino acid 
residues is exceedingly small. Brenner et. al., Proc. Natl. Acad. Sci. 95:6073-78 (1998). Given 
homology in excess of 40% over many more than 70 amino acid residues, the probability that the 
claimed polypeptide is related to human pre-inter-a-trypsin inhibitor, human pre-inter-a-trypsin 
inhibitor heavy chain HI, and pre-inter-a-trypsin inhibitor heavy chain H3 is, accordingly, very 
high. 

The Patent Office must accept the Appellants' demonstration that the homology between 
the claimed invention and human pre-inter-a-trypsin inhibitor, human pre-inter-a-trypsin 
inhibitor heavy chain HI, and pre-inter-a-trypsin inhibitor heavy chain H3 demonstrates utility 
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by a reasonable probability unless the Patent Office can demonstrate through evidence or sound 
scientific reasoning that a person of ordinary skill in the art would doubt utility. See In re 
Lunger, 503 R2d 1380, 1391-92, 183 USPQ 288 (CCPA 1974). The Examiner has not provided 
sufficient evidence or sound scientific reasoning to the contrary. 

C. The uses of the claimed antibody and the SEQ ID NO:l polypeptide to which 
the antibody specifically binds for toxicology testing, drug discovery, and 
disease diagnosis are practical uses that confer "specific benefits" to the 
public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in 
toxicology testing, drug development and disease diagnosis through gene expression profiling. 
These uses are explained in detail in the Furness Declaration. There is no dispute that the 
claimed invention is in fact a useful tool in two-dimensional polyacrylamide gel electrophoresis 
("2-D PAGE") analysis and western blots used to monitor protein expression and assess drug 
toxicity. 

The instant application, Serial No. 09/828,423, filed on April 5, 2001 (hereinafter 
"Hillman '423 application"), is a divisional of and claims priority to U.S. application Serial No. 
09/388,774 filed September 2, 1999, issued May 8, 2001 as U.S. Patent No. 6,228,991, which is 
a divisional application and claimed priority to U.S. application Serial No. 09/074,579 filed May 
7, 1998, issued December 14, 1999 as U.S. Patent No. 6,001,596 (hereinafter "the Hillman '579 
application"), all having the identical specification., with the exception of corrected 
typographical errors and reformatting. 

In his Declaration, Mr. Furness explains the many reasons why a person skilled in the art 
who read the Hillman '579 application on May 7, 1998 would have understood that application 
to disclose the claimed antibody and the SEQ ID NO:l polypeptide to which the claimed 
antibody specifically binds to be useful for a number of gene and protein expression monitoring 
applications, e.g., in 2-D PAGE technologies, in connection with the development of drugs and 
the monitoring of the activity of such drugs. (Furness Declaration at, e.g., ffl 9-13). Much, but 
not all, of Mr. Furness' explanation concerns the use of the claimed antibody and the SEQ ID 
NO:l polypeptide to which the claimed antibody specifically binds in the creation of protein 
expression maps using 2-D PAGE. 

113597 10 09/828,423 



1 



Docket No.: PF-0505-2 DIV 

2-D PAGE technologies were developed during the 1980's. Since the early 1990's, 2-D 
PAGE has been used to create maps showing the differential expression of proteins in different 
cell types or in similar cell types in response to drugs and potential toxic agents. Each expression 
pattern reveals the state of a tissue or cell type in its given environment, e.g., in the presence or 
absence of a drug. By comparing a map of cells treated with a potential drug candidate to a map 
of cells not treated with the candidate, for example, the potential toxicity of a drug can be 
assessed (see Furness Declaration at f 10). 

The claimed invention makes 2-D PAGE analysis a more powerful tool for toxicology 

and drug efficacy testing. A person of ordinary skill in the art can derive more information about 

the state or states or tissue or cell samples from 2-D PAGE analysis with the claimed invention 

than without it. As Mr. Furness explains: 

In view of the Hillman '579 application, the Wilkins article, and other related pre- 
May 7, 1998 publications, persons skilled in the art on May 7, 1998 clearly would 
have understood the Hillman '579 application to disclose the claimed antibody 
and the SEQ ID NO:l polypeptide to which the claimed antibody specifically 
binds to be useful in 2-D PAGE analyses for the development of new drugs and 
monitoring the activities of drugs for such purposes as evaluating their efficacy 
and toxicity .... (Furness Declaration, f 10) 

# # * 

Persons skilled in the art would appreciate that a 2-D PAGE map that utilized the 
claimed antibody and the SEQ ID NO:l polypeptide to which the claimed 
antibody specifically binds would be a more useful tool than a 2-D PAGE map 
that did not utilize this protein sequence in connection with conducting protein 
expression monitoring studies on proposed (or actual) drugs for treating 
reproductive, developmental, neoplastic, and immunological disorders for such 
purposes as evaluating their efficacy and toxicity. (Furness Declaration, fl2) 

Mr. Furness' observations are confirmed in the literature published before the filing of the 
patent application. Wilkins, for example, describes how 2-D gels are used to define proteins 
present in various tissues and measure their levels of expression, the data from which is in turn 
used in databases: 

For proteome projects, the aim of [computer-aided 2-D PAGE] analysis ... is to 
catalogue all spots from the 2-D gel in a qualitative and if possible quantitative 
manner, so as to define the number of proteins present and their levels of 
expression. Reference gel images, constructed from one or more gels, for the 
basis of two-dimensional gel databases. (Wilkins, Tab C, p. 26). 

113597 11 09/828,423 



I 



i 



Docket No.: PF-0505-2 DIV 

D. The use of proteins expressed by humans as tools for toxicology testing, drug 
discovery, and the diagnosis of disease is now "well-established" 

The technologies made possible by expression profiling using polypeptides are now well- 
established. The technical literature recognizes not only the prevalence of these technologies, but 
also their unprecedented advantages in drug development, testing and safety assessment. These 
technologies include toxicology testing, as described by Furness in his Declaration. 

Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., 

John C. Rockett, et al, Differential gene expression in drug metabolism and toxicology: 

practicalities, problems, and potential , Xenobiotica 29:655-691 (July 1999) (Reference No. 1): 

Knowledge of toxin-dependent regulation in target tissues is not solely an 
academic pursuit as much interest has been generated in the pharmaceutical 
industry to harness this technology in the early identification of toxic drug 
candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. (Reference No. 1, page 656) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir, et ah, 
Microarravs and Toxicology: The Advent of Toxicogenomics , Molecular Carcinogenesis 24:153- 
159 (1999) (Reference No. 2); Sandra Steiner and N. Leigh Anderson, Expression profiling in 
toxicology - potentials and limitations . Toxicology Letters 112-13:467-471 (2000) (Reference 
No.3). 

The more genes - and, accordingly, the polypeptides they encode ~ that are available for 
use in toxicology testing, the more powerful the technique. Control genes are carefully selected 
for their stability across a large set of array experiments in order to best study the effect of 
toxicological compounds. See attached email from the primary investigator, Dr. Cynthia Afshari 
to an Incyte employee, dated July 3, 2000, as well as the original message to which she was 
responding (Reference No. 4). Thus, there is no expressed gene which is irrelevant to screening 
for toxicological effects, and all expressed genes have a utility for toxicological screening. 

In fact, the potential benefit to the public, in terms of lives saved and reduced health care 

costs, are enormous. Recent developments provide evidence that the benefits of this information 

are already beginning to manifest themselves. Examples include the following: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
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gene, and chromosomal mapping location, to identify the key gene associated with 
Tangier disease. This discovery took place over a matter of only a few weeks, due 
to the power of these new genomics technologies. The discovery received an 
award from the American Heart Association as one of the top 10 discoveries 
associated with heart disease research in 1999. 

In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte's genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 

In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 
stated that over 50 percent of the drug targets in its current pipeline were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejections should be reversed. 



E. Objective evidence corroborates the utilities of the claimed invention 

There is in fact no restriction on the kinds of evidence a Patent Examiner may consider in 
determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence 
showing actual use or commercial success of the invention, can demonstrate conclusive proof of 
utility. Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 
856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any 
person or entity other than the patentee is conclusive proof of utility. United States Steel Corp. v. 
Phillips Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing all 
expressed genes (along with the polypeptide translations of those genes). (Note that the value in 
these databases is enhanced by their completeness, but each sequence in them is independently 
valuable.) The databases sold by Appellants' assignee, Incyte, include exactly the kinds of 
information made possible by the claimed invention, such as tissue and disease associations. 
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Incyte sells its database containing the GAPIP sequence and millions of other sequences 
throughout the scientific community, including to pharmaceutical companies who use the 
information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. As Incyte adds information to its databases, including the information that can be 
generated only as a result of Incyte's discovery of the claimed antibody and the SEQ ID NO:l 
polypeptide to which the claimed antibody specifically binds, the databases become even more 
powerful tools. Thus the claimed invention adds more than incremental benefit to the drug 
discovery and development process. 

III. The Patent Examiner's Rejections Are Without Merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether by alleging that the disclosed and well-established utilities for the claimed 
antibody and the SEQ ID NO:l polypeptide to which the claimed antibody specifically binds are 
not a specific and substantial asserted utility, credible asserted utility or well-established utility 
(see 6/17/02 Office Action, at page 2). The Examiner is incorrect both as a matter of law and as 
a matter of fact. 

A. The Precise Biological Role Or Function Of An Expressed Polypeptide Is Not 
Required To Demonstrate Utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground 
that, without information as to the precise "biological role" of the claimed invention, the claimed 
invention's utility is not sufficiently specific. According to the Examiner, it is not enough that a 
person of ordinary skill in the art could use and, in fact, would want to use the claimed invention 
either by itself or in a 2-D gel or western blot to monitor the expression of genes for such 
applications as the evaluation of a drug's efficacy and toxicity. The Examiner would require, in 
addition, that the applicant provide a specific and substantial interpretation of the results 
generated in any given expression analysis. 

It may be that specific and substantial interpretations and detailed information on 
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biological function are necessary to satisfy the requirements for publication in some technical 
journals, but they are not necessary to satisfy the requirements for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 
why the invention works, In re Cortwright, 165 F.3d at 1359, but rather whether the invention 
provides an "identifiable benefit" in presently available form. Juicy Whip Inc. v. Orange Bang 
Inc., 185 F.3d at 1366. If the benefit exists, and there is a substantial likelihood the invention 
provides the benefit, it is useful. There can be no doubt, particularly in view of the Furness 
Declaration (at, e.g., ff 9-13), that the present invention meets this test. 

The threshold for determining whether an invention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would not know how to achieve an identifiable benefit and, at least 
according to the PTO guidelines, so-called "throwaway" utilities that are not directed to a person 
of ordinary skill in the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been 

required to show real-world benefit. In its most recent explanation of its own utility guidelines, 

the PTO acknowledged as much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e.g., it hybridizes near a disease-associated gene or it has gene-regulating 
activity. 

By implicitly requiring knowledge of biological function for the SEQ ID NO: 1 
polypeptide to which the claimed antibody specifically binds, the Examiner has, contrary to law, 
elevated what is at most an evidentiary factor into an absolute requirement of utility. Rather than 
looking to the biological role or function of the claimed invention, the Examiner should have 
looked first to the benefits it is alleged to provide. 

B. Membership in a Class of Useful Products Can Be Proof of Utility 

Despite the uncontradicted evidence that the claimed polypeptide is a member of the 
protease inhibitor family, whose members indisputably are useful, the Examiner refused to 
impute the utility of the members of the protease inhibitor family to GAPEP. The Patent 
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Examiner takes the position that unless Appellants can identify which particular biological 
function within the class of protease inhibitors is possessed by GAPIP, utility cannot be imputed. 
( See 6/17/02 Office Action, at pages 3-4) To demonstrate utility by membership in the class of 
protease inhibitors, the Examiner would require that all protease inhibitors possess a "common" 
utility. 

There is no such requirement in the law. In order to demonstrate utility by membership in 
a class, the law requires only that the class not contain a substantial number of useless members. 
So long as the class does not contain a substantial number of useless members, there is sufficient 
likelihood that the claimed invention will have utility and a rejection under 35 U.S.C. § 101 is 
improper. That is true regardless of how the claimed invention ultimately is used and whether 
the members of the class possess one utility or many. See Brenner v. Manson, 383 U.S. 519, 532 
(1966); Application of Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class 
contains a substantial number of useless members. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless members of the class. In the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predominately useless members, e.g., Brenner (man-made steroids); Kirk (same); Natta 
(man-made polyethylene polymers). 1 

The Examiner addresses GAPIP as if the general class in which it is included is not the 
protease inhibitor family, but rather all polypeptides, including the vast majority of useless 
theoretical molecules not occurring in nature, and thus not pre-selected by nature to be useful. 
While these "general classes" may contain a substantial number of useless members, the protease 
inhibitor family does not. The protease inhibitor family is sufficiently specific to rule out any 
reasonable possibility that GAPIP would not also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the protease inhibitor class of 
proteins has any, let alone a substantial number, of useless members, the Examiner must 



1 At a recent Biotechnology Customer Partnership Meeting, PTO Senior Examiner James 
Martinell described an analytical framework roughly consistent with this analysis. He stated that 
when an applicant's claimed protein "is a member of a family of proteins that already are known 
based upon sequence homology," that can be an effective assertion of utility. 
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conclude that there is a "substantial likelihood" that the GAPIP encoded by the claimed 
polypeptide is useful. 

Even if the Examiner's "common utility" criterion were correct - and it is not - the 
protease inhibitor family would meet it. It is undisputed that known members of the protease 
inhibitor family regulate the activity and effect of proteases. A person of ordinary skill in the art 
need not know any more about how the claimed invention regulates the activity and effect of 
proteases to use it, and the Examiner presents no evidence to the contrary. Instead, the Examiner 
makes the conclusory observation that a person of ordinary skill in the art would need to know 
whether, for example, any given protease inhibitor regulates the activity and effect of proteases. 
The Examiner then goes on to assume that the only use for GAPIP absent knowledge as to how 
this member of the protease inhibitor family actually works is further study of GAPIP itself. 

Not so. As demonstrated by Appellants, knowledge that GAPIP is a protease inhibitor is 
more than sufficient to make it useful for the diagnosis and treatment of cancer and immune 
disorders. Indeed, GAPIP has been shown to be expressed in cancer and immune cells. The 
Examiner must accept these facts to be true unless the Examiner can provide evidence or sound 
scientific reasoning to the contrary. But the Examiner has not done so. 

C. The uses of GAPIP in toxicology testing, drug discovery, and disease 
diagnosis are practical uses beyond mere study of the invention itself 

There is no authority for the proposition that use as a tool for research is not a substantial 

utility. Indeed, the Patent Office itself has recognized that just because an invention is used in a 

research setting does not mean that it lacks utility (Section 2107.01 of the Manual of Patent 

Examining Procedure, 8 th Edition, August 2001, under the heading I. Specific and Substantial 

Requirements, Research Tools): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utility (e.g., they 
are useful in analyzing compounds). An assessment that focuses on whether an 
invention is useful only in a research setting thus does not address whether the 
specific invention is in fact "useful" in a patent sense. Instead, Office personnel 
must distinguish between inventions that have a specifically identified substantial 
utility and inventions whose asserted utility requires further research to identify or 
reasonably confirm. 
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The PTO's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to facilitate research, 
such as DNA ligases, acknowledged by the PTO's Training Materials to be useful. 

The subset of research uses that are not "substantial" utilities is limited. It consists only 
of those uses in which the claimed invention is to be an object of further study, thus merely 
inviting further research on the invention itself. This follows from Brenner, in which the U.S. 
Supreme Court held that a process for making a compound does not confer a substantial benefit 
where the only known use of the compound was to be the object of further research to determine 
its use. Id. at 535. Similarly, in Kirk, the Court held that a compound would not confer 
substantial benefit on the public merely because it might be used to synthesize some other, 
unknown compound that would confer substantial benefit. Kirk, 376 F.2d at 940, 945. ("What 
Applicants are really saying to those in the art is take these steroids, experiment, and find what 
use they do have as medicines.") Nowhere do those cases state or imply, however, that a 
material cannot be patentable if it has some other, additional beneficial use in research. 

Such beneficial uses beyond studying the claimed invention itself have been 
demonstrated, in particular those described in the Furness Declaration. The Furness Declaration 
demonstrates that the claimed invention is a tool, rather than an object, of research, and it 
demonstrates exactly how that tool is used. Without the claimed invention, it would be more 
difficult to generate information regarding the properties of tissues, cells, drug candidates and 
toxins apart from additional information about the polypeptide itself. 

The claimed invention has numerous other uses as a research tool, each of which alone is 
a "substantial utility." These include uses drug screening (e.g., Specification at page 38). 

D. The Patent Examiner Failed to Demonstrate That a Person of Ordinary Skill 
in the Art Would Reasonably Doubt the Utility of the Claimed Invention 

The 6/17/02 Office Action has also set forth the novel theory that the central dogma of 
molecular biology (i.e., DNA directs transcription of messenger RNA which in turn directs 
translation of protein) somehow does not apply to the discoveries of the present application. 
That is, the nucleotide sequence of SEQ ID NO:2 (which encodes the polypeptide of SEQ ID 
NO:l) was determined from a human uterus cDNA library. That cDNA library in turn was made 
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from messenger RNA isolated from human tissue. See the Specification, for example, at pages 
38-39. Thus, the nucleotide sequences of the present invention are expressed sequences. The 
6/17/02 Office Action purports that the existence of an expressed mRNA does not insure that the 
protein encoded by the mRNA will be translated and, hence, the claimed subject matter lacks 
patentable utility. 

Regulation of gene expression occurs at many levels, including transcription, splicing, 
polyadenylation, mRNA stability, mRNA transport and compartmentalization, translation 
efficiency, protein modification, and protein turnover. While steady state mRNA levels are not 
always directly proportional to the amount of protein produced in a cell, mRNA levels are 
routinely used as an indicator of protein expression. Countless scientific publication have been 
based on data relating to mRNA levels when the polypeptide encoded by the mRNA was 
unknown or difficult to detect. Moreover, mRNA levels are usually a good indicator of protein 
levels in a cell. The 6/17/02 Office Action, cites an example of inhibition of translation 
initiation; however, this example represents a comparatively unusual mechanism of gene 
regulation. According to B. Lewin [(1997) Genes VI Oxford University Press, Inc. New York, 
NY]: 

Transcription of a gene in the active state is controlled at the stage of initiation, 
that is, by the interaction of RNA polymerase with its promoter. This is now 
becoming susceptible to study in the in vitro systems... For most genes, this is a 
major control point; probably it is the most common level of regulation, [page 
847, emphasis added]. 

But having acknowledged that control of gene expression can occur at multiple 
stages, and that production of RNA cannot inevitably be equated with production 
of protein, it is clear that the overwhelming majority of regulatory events occur at 
the initiation of transcription. Regulation of tissue-specific gene transcription 
lies at the heart ofeukaryotic differentiation, [pages 847-848, emphasis added] 

Thus the question is not whether there is the potential for post-transcriptional regulation 
of SEQ ID NO:l expression but whether one skilled in the art would have a reasonable 
expectation that SEQ ID NO:l expression correlates with the levels of SEQ ID NO:2 mRNA. 
Applicants need only prove a "substantial likelihood" of utility; certainty is not required. 
Brenner v. Manson, 383 U.S. 519, 532, 148 USPQ 689 (1966). In the case of the instant 
invention, one skilled in the art would be imprudent in assuming, a priori, that protein levels did 
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not correspond to mRNA levels and that levels of SEQ ID NO:l were controlled predominantly 
in a post-transcriptional manner, thereby dismissing the significance of mRNA levels. Inasmuch 
as the predictive value of mRNA levels applies to the "utility" of Appellants' invention, 
Appellants request reversal of the rejection. 

IV. By Requiring the Patent Applicant to Assert a Particular or Unique Utility, the 
Patent Examination Utility Guidelines and Training Materials Applied by the 
Patent Examiner Misstate the Law 

There is an additional, independent reason to reverse the rejections: to the extent the 
rejections are based on Revised Interim Utility Examination Guidelines (64 FR 71427, 
December 21, 1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utility Guidelines Training Materials (USPTO Website 
www.uspto.gov, March 1, 2000), the Guidelines and Training Materials are themselves 
inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 

Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: 

"specific" utilities, which meet the statutory requirements, and "general" utilities, which do not. 

The Training Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to 
general utility that would be applicable to the broad class of invention. For 
example, a claim to a polynucleotide whose use is disclosed simply as "gene 
probe" or "chromosome marker" would not be considered to be specific in the 
absence of a disclosure of a specific DNA target. Similarly, a general statement of 
diagnostic utility, such as diagnosing an unspecified disease, would ordinarily be 
insufficient absent a disclosure of what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," i.e., unique (Training Materials at p.52) as 
compared to the "broad class of invention." (In this regard, the Training Materials appear to 
parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility 
Guidelines , 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the 
question to ask is whether or not a utility set forth in the specification is particular to the claimed 
invention.").) 
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Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the public. Brenner, 383 U.S. at 534. Thus incredible 
"throwaway" utilities, such as trying to "patent a transgenic mouse by saying it makes great snake 
food," do not meet this standard. Karen Hall, Genomic Warfare . The American Lawyer 68 (June 
2000) (quoting John Doll, Chief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utility" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite," not particular. Montedison, 664 F.2d at 375. Appellant is not aware of any court that 
has rejected an assertion of utility on the grounds that it is not "particular" or "unique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utility in the patent disclosure was not a practical use that conferred a 
specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk, for example, the CCPA held the assertion that a 
man-made steroid had "useful biological activity" was insufficient where there was no 
information in the specification as to how that biological activity could be practically used. Kirk, 
376 F.2dat941. 

The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. See Brana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of cancer 
was determined to satisfy the specificity requirement). "Particularity" is not and never has been 
the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utility requirement. Supra § JJI.B. {Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, unpatentable category 
of "general" utilities. As a result, the Training Materials paint with too broad a brush. 
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Rigorously applied, they would render unpatentable whole categories of inventions heretofore 
considered to be patentable, and that have indisputably benefitted the public, including the 
claimed invention. See supra § m.B. Thus the Training Materials cannot be applied 
consistently with the law. 

Issue 2 - Utility rejection under 35 U.S.C. § 112. first paragraph 

The rejection set forth in the 6/17/02 Office Action, is based on the assertions discussed 
above, i.e., that the claimed invention lacks patentable utility. To the extent that the rejection 
under § 1 12, first paragraph, is based on the improper allegation of lack of patentable utility 
under § 101, it fails for the same reasons. 

Issue 3 - Enabl ement rejection under 35 U.S.C. $ 112. first parag raph 

In addition, claims 3, 5-6, 8, 11-12, 14-17 and 20-21 have been rejected as failing to meet 
the enablement requirement of 35 U.S.C. §112, first paragraph, because the Specification 
allegedly does not describe how to make the claimed antibodies. The Examiner does not dispute 
that the present application describes how to make an antibody which specifically binds to a 
polypeptide comprising the amino acid sequence of SEQ ID NO: 1. ( See 6/17/02 Office Action, 
at page 6-7). However, the Examiner alleges that the present disclosure does not describe how to 
make (a) an antibody which specifically binds to a polypeptide comprising a naturally occurring 
amino acid sequence at least 90% identical to the amino acid sequence of SEQ ID NO:l; and (b) 
an antibody which specifically binds to an immunogenic fragment of the amino acid sequence of 
SEQ ID NO:l. Such, however, is not the case. 

At the outset, note that this rejection should not apply to claim 21, which recites an 
antibody which specifically binds to a polypeptide comprising the amino acid sequence of SEQ 
ID NO:l. Also, claim 20 should be considered separately from claims 3, 5, 6, 8, 11 and 14-17. 
That is, claim 20 recites an antibody which specifically binds to an immunogenic fragment 
having at least 15 contiguous amino acid residues of a polypeptide comprising the amino acid 
sequence of SEQ ID NO: 1 . Separate reasons for patentability of claim 20 are discussed below. 

The Examiner does not appear to dispute that conventional methods for making 
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antibodies could be used to make antibodies which specifically bind to a polypeptide comprising 

a naturally occurring amino acid sequence at least 90% sequence identical to the amino acid 

sequence of SEQ ID NO:l or to immunogenic fragments of SEQ ID NO:l. Instead, the 

Examiner asserts that the present disclosure is deficient because one of skill in the art would not 

be able to make the variant polypeptides and immunogenic fragments of SEQ ID NO:l per se 

and, hence, without the variant polypeptides and immunogenic fragments, one would not be able 

to make antibodies which specifically bind to those variant polypeptides and immunogenic 

fragments. On the contrary, the Specification is sufficient in this regard. 

Note that claim 3 recites not only that the variant polypeptides are at least 90% identical 

to SEQ ID NO:l, but also have "a naturally-occurring amino acid sequence." Through the 

process of natural selection, nature will have determined the appropriate amino acid sequences. 

Given the information provided by SEQ ID NO:l (the amino acid sequence of GAPEP) and SEQ 

ID NO:2 (the polynucleotide sequence encoding GAPIP), one of skill in the art would be able to 

routinely obtain "a naturally-occurring amino acid sequence at least 90% identical to the amino 

acid sequence of SEQ ID NO:l " For example, the identification of relevant polynucleotides 

could be performed by hybridization and/or PCR techniques that were well-known to those 

skilled in the art at the time the subject application was filed and/or described throughout the 

Specification of the instant application. For example: 

As used herein, the term "stringent conditions" refers to conditions which permit 
hybridization between polynucleotides and the claimed polynucleotides. Stringent 
conditions can be defined by salt concentration, the concentration of organic 
solvent (e.g., formamide), temperature, and other conditions well known in the 
art. In particular, stringency can be increased by reducing the concentration of 
salt, increasing the concentration of formamide, or raising the hybridization 
temperature. 

For example, stringent salt concentration will ordinarily be less than about 750 
mM NaCl and 75 mM tri sodium citrate, preferably less than about 500 mM NaCl 
and 50 mM trisodium citrate, and most preferably less than about 250 mM NaCl 
and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the 
absence of organic solvent, e.g., formamide, while high stringency hybridization 
can be obtained in the presence of at least about 35% formamide, and most 
preferably at least about 50% formamide. Stringent temperature conditions will 
ordinarily include temperatures of at least about 30°C, more preferably of at least 
about 37°C, and most preferably of at least about 42°C. Varying additional 
parameters, such as hybridization time, the concentration of detergent, e.g., 
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sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are 
well known to those skilled in the art. Various levels of stringency are 
accomplished by combining these various conditions as needed. In a preferred 
embodiment, hybridization will occur at 30°C in 750 mM NaCl, 75 mM trisodium 
citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 
37°C in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 
100 jug/ml denatured salmon sperm DNA (ssDNA). In a most preferred 
embodiment, hybridization will occur at 42°C in 250 mM NaCl, 25 mM trisodium 
citrate, 1% SDS, 50 % formamide, and 200 jug/ml ssDNA. Useful variations on 
these conditions will be readily apparent to those skilled in the art. 

The washing steps which follow hybridization can also vary in stringency. Wash 
stringency conditions can be defined by salt concentration and by temperature. As 
above, wash stringency can be increased by decreasing salt concentration or by 
increasing temperature. For example, stringent salt concentration for the wash 
steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, 
and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. 
Stringent temperature conditions for the wash steps will ordinarily include 
temperature of at least about 25°C, more preferably of at least about 42°C, and 
most preferably of at least about 68°C. In a preferred embodiment, wash steps 
will occur at 25°C in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a 
more preferred embodiment, wash steps will occur at 42°C in 15 mM NaCl, 1.5 
mM trisodium citrate, and 0.1% SDS. In a most preferred embodiment, wash 
steps will occur at 68°C in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% 
SDS. Additional variations on these conditions will be readily apparent to those 
skilled in the art. 

(Specification at page 12, line 12 to page 13, line 12) 

In one aspect, hybridization with PCR probes which are capable of detecting 
polynucleotide sequences, including genomic sequences, encoding GAPIP or 
closely related molecules may be used to identify nucleic acid sequences which 
encode GAPIP. The specificity of the probe, whether it is made from a highly 
specific region, e.g., the 5' regulatory region, or from a less specific region, e.g., a 
conserved motif, and the stringency of the hybridization or amplification 
(maximal, high, intermediate, or low), will determine whether the probe identifies 
only naturally occurring sequences encoding GAPIP, allelic variants, or related 
sequences. (Specification at page 34, lines 2-8) 

Probes may also be used for the detection of related sequences, and should 
preferably have at least 50% sequence identity to any of the GAPIP encoding 
sequences. The hybridization probes of the subject invention may be DNA or 
RNA and may be derived from the sequence of SEQ ID NO:2 or from genomic 
sequences including promoters, enhancers, and introns of the GAPIP gene. 
(Specification at page 34, lines 9-12) 
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See also Example VI at pages 44-45. 

Thus, one skilled in the art need not make and test vast numbers of polypeptides that are 
based on the amino acid sequence of SEQ ID NO:l. Instead, one skilled in the art need only 
screen a cDNA library or use appropriate PCR conditions to identify relevant 
polynucleotides/polypeptides that already exist in nature. By adjusting the nature of the probe or 
nucleic acid (i.e., non-conserved, conserved or highly conserved) and the conditions of 
hybridization (maximum, high, intermediate or low stringency), one can obtain variant 
polynucleotides of SEQ ID NO:2 which, in turn, will allow one to make the variant polypeptides 
of SEQ ID NO:l recited by the present claims. Conventional methods for making antibodies, 
such as those described at pages 26-28 of the Specification, could then be used to make 
antibodies which specifically bind to the recited polypeptide variants. 

Accordingly, the Abaza et al. document cited by the Examiner relating to structure- 
function relationships in proteins is simply not germane to whether one can make and use the 
polypeptide variants recited by the present claims. That is, regardless of the precise functional 
characteristics of the SEQ ID NO:l variants, one can still make those polypeptide variants, and 
antibodies which specifically bind to the variants, using the disclosure provided by the present 
Specification. The antibodies could then be used in, for example, diagnostic testing, drug 
discovery, expression profiling, etc. (See, e.g., Furness Declaration). 

Furthermore, the Board's attention is also directed to the enclosed reference by Brenner et 
al. ("Assessing sequence comparison methods with reliable structurally identified distant 
evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078) (Reference No. 
5). Through exhaustive analysis of a data set of proteins with known structural and functional 
relationships and with <90% overall sequence identity, Brenner et al. have determined that 30% 
identity is a reliable threshold for establishing evolutionary homology between two sequences 
aligned over at least 150 residues. (Brenner et al., pages 6073 and 6076.) Furthermore, local 
identity is particularly important in this case for assessing the significance of the alignments, as 
Brenner et al. further report that >40% identity over at least 70 residues is reliable in signifying 
homology between proteins. (Brenner et al., page 6076.) 

Claim 3 recites, inter alia, antibodies which specifically bind to "a polypeptide 
comprising ... a naturally occurring amino acid sequence at least 90% identical to the amino 
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acid sequence of SEQ ID NO:l." In accordance with Brenner et al, naturally occurring 
molecules may exist which could be characterized as growth-associated protease inhibitor heavy 
chain precursors and which have as little as 30% identity over at least 150 residues to SEQ ID 
NO: 1. The "90% variants" recited by the present claims have a variation that is far less than that 
of all potential growth-associated protease inhibitor heavy chain precursors related to SEQ ID 
NO:l, i.e., those growth-associated protease inhibitor heavy chain precursors having as little as 
30% identity over at least 150 residues to SEQ ID NO:l. Therefore, one would expect the SEQ 
ID NO:l variants recited by the present claims to have the functional activities of a growth- 
associated protease inhibitor heavy chain precursor. 

Furthermore, the Examiner has asserted that one of skill in the art could not make and use 
an isolated antibody which specifically binds an immunogenic fragment of SEQ ID NO:l. Such, 
however, is not the case. 

At pages 14-15, the Specification describes the polynucleotide of SEQ ID NO;2, the 
polypeptide encoded by that polynucleotide, i.e., SEQ ID NO:l, and chemical and structural 
characteristics thereof. The polypeptide and fragments thereof can be produced by either 
recombinant means (see, e.g., the Specification at pages 18-23) or by chemical synthesis (see, 
e.g., the Specification at page 18, lines 15-26; and page 23, lines 27-31). As discussed at length 
above in connection with the "utility" rejection, the use of antibodies and the polypeptides to 
which they specifically bind for diagnosis of diseases, for toxicology testing, and for drug 
discovery are well known in the art, e.g., via the use of expression profiling. Such uses are also 
described in the Specification, e.g., at pages 33-38. Hence, the requirement for providing 
objective enablement has been met. 

The Examiner questions in particular, whether the present Specification provides 
sufficient guidance to enable the identification of immunogenic fragments of SEQ ID NO:l (See 
the 6/17/02 Office Action at page 7 ). The Examiner's concerns are untenable as the 
Specification is fully sufficient in this regard. 

First note that at page 8, lines 5-6, "immunologically active" is defined as "the capability 
of the natural, recombinant, or synthetic GAPIP, or of any oligopeptide thereof, to induce a 
specific immune response in appropriate animals or cells and to bind with specific antibodies." 
Specific binding is further defined at page 12 as meaning: 
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. . . that interaction between a protein or peptide and an agonist, an antibody, or an 
antagonist. The interaction is dependent upon the presence of a particular 
structure of the protein, e.g., the antigenic determinant or epitope, recognized by 
the binding molecule. For example, if an antibody is specific for epitope "A," the 
presence of a polypeptide containing the epitope A, or the presence of free 
unlabeled A, in a reaction containing free labeled A and the antibody will reduce 
the amount of labeled A that binds to the antibody. 

Methods of producing specifically binding antibodies are described in the Specifcation, 
for example, at pages 26-28. In this regard, note the paragraph at page 27, lines 3-9, which 
describes fragment sizes of GAPIP for raising antibodies. See also page 48 which describes the 
production of antibodies to fragments of GAPIP, including the description of how to identify 
appropriate immunogenic sites of GAPIP: 

... the GAPIP amino acid sequence is analyzed using LASERGENE software to 
determine regions of high immunogenicity, and a corresponding oligopeptide is 
synthesized and used to raise antibodies by means known those of skill in the art. 
Methods for selection of appropriate epitopes, such as those near the C-terminus 
or in hydrophilic regions are well described in the art. (See, e.g., Ausubel supra , 
ch. 11.) (Specification at page 48, lines 24-28) 

Not only is the Examiner's position factually incorrect (as shown above) but it is also 
legally in error. As set forth in In re Marzocchi, 169 USPQ 367, 369 (CCPA 1971): 

The first paragraph of § 1 12 requires nothing more than objective enablement. 
[emphasis added] How such a teaching is set forth, either by the use of illustrative 
examples or by broad terminology, is of no importance. 

As a matter of Patent Office practice, then, a specification disclosure which 
contains a teaching of the manner and process of making and using the invention 
in terms which correspond in scope to those used in describing and defining the 
subject matter sought to be patented must be take as in compliance with the 
enabling requirement of the first paragraph of § 1 12 unless there is reason to 
doubt the objective truth of the statements contained therein which must be relied 
on for enabling support. 

Contrary to the standard set forth in Marzocchi, the Examiner has failed to provide any 
reasons why one would doubt that the guidance provided by the present Specification would 
enable one to make and use the recited antibodies which specifically bind to the variants and 
fragments of SEQ ID NO:l. Therefore, a prima facie case for non-enablement has not been 
established. 
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For at least the above reasons, reversal of this rejection is requested. 

Issue 4 - Written description rejection 

Claims 3, 5-6, 8, 11-12, 14-17 and 20-21 were rejected under 35 U.S.C. § 112, first 
paragraph, as allegedly being based on a Specification which provides an inadequate written 
description of what is claimed. The rejection of claims 3, 5-6, 8, 1 1-12, 14-17 and 20-21 is in 
error; the claims meet the written description requirement of 35 U.S.C. 1 12, first paragraph. 

The Examiner appears to be taking the position that every single member of the claimed 
genus of polypeptides "and the antibodies that bind these fragments and variants" must be 
specifically disclosed by the Specification, otherwise an inadequate written description has been 
set forth. (See 6/17/02 Office Action at page 5). However this position is erroneous; no such 
disclosure is required for an adequate written description. 

At the outset, note that this rejection should not apply to claim 21, which recites an 
antibody which specifically binds to a polypeptide comprising the amino acid sequence of SEQ 
ID NO:l. Also, claim 20 should be considered separately from claims 3, 5, 6, 8, 11 and 14-17. 
That is, claim 20 recites an antibody which specifically binds to an immunogenic fragment 
having at least 15 contiguous amino acid residues of a polypeptide comprising the amino acid 
sequence of SEQ ID NO: 1 . Separate reasons for patentability of claim 20 are discussed below. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. § 
112, first paragraph, are well established by case law. 

... the applicant must also convey with reasonable clarity to those skilled in the 
art that, as of the filing date sought, he or she was in possession of the invention. 
The invention is, for purposes of the "written description" inquiry, whatever is 
now claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 
1991) 

The Board's attention is also drawn to the Patent and Trademark Office's own 
"Guidelines for Examination of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", 
published January 5, 2001, which provide that: 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics 42 which provide evidence 
that applicant was in possession of the claimed invention, 43 i.e., complete or 
partial structure, other physical and/or chemical properties, functional 
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characteristics when coupled with a known or disclosed correlation between 
function and structure, or some combination of such characteristics. 44 What is 
conventional or well known to one of ordinary skill in the art need not be 
disclosed in detail. 45 If a skilled artisan would have understood the inventor to be 
in possession of the claimed invention at the time of filing, even if every nuance 
of the claims is not explicitly described in the specification, then the adequate 
description requirement is met 46 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art 

a. The specification provides an adequate written description of the claimed 
"variants" and "fragments" of SEQ ID NO:l 

The subject matter encompassed by claims 3, 5-6, 8, 1 1-12, 14-17 and 20-21 is either 
disclosed by the Specification or is conventional or well known to one skilled in the art. 

First note that the "variant" and "fragment" language of independent claim 23 recites an 
isolated antibody which specifically binds to a polypeptide comprising "a naturally occurring 
amino acid sequence at least 90% identical to the amino acid sequence of SEQ ID NO:l", or "an 
immunogenic fragment of a polypeptide comprising the amino acid sequence of SEQ ID NO:l." 
The polypeptide sequence of SEQ ID NO:l is explicitly disclosed in the specification. See, for 
example, the Sequence Listing. Variants of SEQ ID NO:l are described in the Specification at, 
for example, page 3, lines 4-5; page 6, lines 2-5 and 9-15; and page 15, lines 12-15. Fragments 
of SEQ ID NO:l are described in the Specification at, for example, page 3, lines 4-5; page 4, 
lines 2-4; and page 7, lines 1-7. 

One of ordinary skill in the art would recognize polypeptide sequences which are variants 
at least 90% identical to SEQ ID NO:l. Given any naturally occurring polypeptide sequence, it 
would be routine for one of skill in the art recognize whether it was a variant of SEQ ID NO:l. 
Similarly, SEQ ID NO:l provides the blueprint to describe any immunogenic fragment thereof. 
Accordingly, the Specification provides an adequate written description of the recited variants 
and fragments of SEQ ID NO:l. 

1. The present claims specifically define the claimed genus through the 
recitation of chemical structure 
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Court cases in which "DNA claims" have been at issue (which are hence relevant to 
claims to proteins encoded by the DNA, and antibodies which specifically bind to those proteins) 
commonly emphasize that the recitation of structural features or chemical or physical properties 

are important factors to consider in a written description analysis of such claims. For example, in 

Fiers v. Revel, 25 USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name or physical properties, as we have held, then a description 
also requires that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts 

have noted that the claims attempted to define the claimed DNA in terms of functional 

characteristics without any reference to structural features. As set forth by the court in University 

of California v. Eli Lilly and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate 
insulin cDNA" or "mammalian insulin cDNA," without more, is not an adequate 
written description of the genus because it does not distinguish the claimed genus 
from others, except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1 . A recombinant plasmid replicable in procaryotic host containing within its 
nucleotide sequence a subsequence having the structure of the reverse transcript of 
an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
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description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. § 1 12; i.e., "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. In contrast to the 
situation in Lilly and Fiers, the claims at issue in the present application define polypeptides in 
terms of chemical structure, rather than functional characteristics. For example, the language of 
independent claim 3 recites chemical structure to define the claimed genus: 

3. An isolated antibody which specifically binds to a polypeptide selected from the group 
consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID 
NO:l, 

b) a polypeptide comprising a naturally-occurring amino acid 
sequence at least 90% identical to the amino acid sequence of SEQ 
ID NO:l, said naturally-occurring amino acid sequence encoding a 
polypeptide having protease inhibitor activity, and 

c) an immunogenic fragment of a polypeptide comprising the amino 
acid sequence of SEQ ID NO: 1 . 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO:l. In the present case, 
there is no reliance merely on a description of functional characteristics of the claimed antibodies 
and the polypeptides to which they specifically bind. Moreover, the recitation of functional 
characteristics (i.e., "having protease inhibitor activity" with respect to the recited variants of 
SEQ ID NO:l , and "immunogenic" with respect to the recited fragments of SEQ ID NO:l) adds 
to the structural recitations of the claims. The antibodies and the polypeptides to which they 
specifically bind that are defined by the claims of the present application recite structural features, 
and cases such as Lilly and Fiers stress that the recitation of structure is an important factor to 
consider in a written description analysis of claims of this type. By failing to base its written 
description inquiry "on whatever is now claimed " the Examiner failed to provide an appropriate 
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analysis of the present claims and how they differ from those found not to satisfy the written 
description requirement in Lilly and Fiers. 

2. The present claims do not define a genus which is " highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant". Available evidence illustrates that, rather than being a large variable genus, the 
claimed genus is of narrow scope. 

In support of this assertion, the Board's attention is directed to the reference by Brenner et 
al. ("Assessing sequence comparison methods with reliable structurally identified distant 
evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95 :6073-6078)(of record). 
Through exhaustive analysis of a data set of proteins with known structural and functional 
relationships and with <90% overall sequence identity, Brenner et al. have determined that 30% 
identity is a reliable threshold for establishing evolutionary homology between two sequences 
aligned over at least 150 residues (Brenner et al., pages 6073 and 6076). Furthermore, local 
identity is particularly important in this case for assessing the significance of the alignments, as 
Brenner et al. further report that ^40% identity over at least 70 residues is reliable in signifying 
homology between proteins (Brenner et al., page 6076). 

The present application is directed, inter alia, to antibodies which specifically bind to 
polypeptides related to human growth-associated protease inhibitor heavy chain precursor 
(GAPIP). In particular, the polypeptides are selected from amino acid sequences comprising 
SEQ ID NO:l, naturally occurring amino acid sequences at least 90% identical to SEQ ID NO:l, 
or immunogenic fragments of SEQ ID NO: 1. In accordance with Brenner et al., naturally 
occurring molecules may exist which could be characterized as human growth-associated 
protease inhibitor heavy chain precursor (GAPIP) proteins and which have as little as 30% 
identity over at least 150 residues to SEQ ID NO:L The "variant language" of the present claims 
recites a polypeptide comprising "a naturally occurring amino acid sequence at least 90% 
identical to the amino acid sequence of SEQ ID NO:l" (note that SEQ ID NO:l has 942 amino 
acid residues). This variation is far less than that of all potential GAPIP proteins related to SEQ 
ID NO:l, i.e., those GAPIP proteins having as little as 30% identity over at least 150 residues to 
SEQ ID NO: 1. 
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3. The state of the art at the time of the present invention is further advanced 
than at the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. § 1 12. The '525 patent claimed the 
benefit of priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21, 1979. Thus, the written 
description inquiry in those cases was based on the state of the art at essentially the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of May 7, 1998. Much has happened in the 
development of recombinant DNA technology in the 19 or so years from the time of filing of the 
applications involved in Lilly and Fiers and the present application. For example, the technique 
of polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA sequencing 
technology has been developed. Large databases of protein and nucleotide sequences have been 
compiled. Much of the raw material of the human and other genomes has been sequenced. With 
these remarkable advances, one of skill in the art would recognize that, given the sequence 
information of SEQ ID NO:l, and the additional extensive detail provided by the subject 
application, the present inventors were in possession of the claimed polypeptide variants and 
fragments at the time of filing of this application. 

4. Summary 

The Final Office Action of 3/1 1/03 failed to base its written description inquiry "on 
whatever is now claimed." Consequently, the Action did not provide an appropriate analysis of 
the present claims and how they differ from those found not to satisfy the written description 
requirement in cases such as Lilly and Fiers. In particular, the claims of the subject application 
are fundamentally different from those found invalid in Lilly and Fiers, The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO:L The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polypeptides defined by the 
present claims is adequately described, as evidenced by Brenner et al. Furthermore, there have 
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been remarkable advances in the state of the art since the Lilly and Fiers cases, and these 
advances were given no consideration whatsoever in the position set forth by the Office Action. 

For at least the reasons set forth above, the Specification provides an adequate written 
description of the claimed subject matter, and this rejection should be reversed. 

(9) CONCLUSION 

Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of "lack of specificity" and as justified in the Revised Interim and final Utility 
Guidelines and Training Materials, are not supported in the law. Further not only are they 
scientifically without merit, but are not supported by any evidence or sound scientific reasoning. 
These rejections are alleged to be founded on facts in court cases such as Brenner and Kirk, yet 
those facts are clearly distinguishable from the facts of the instant application, and indeed most if 
not all nucleotide and protein sequence applications. Nevertheless, the PTO is attempting to 
mold the facts and holdings of these prior cases, "like a nose of wax/' 2 to target rejections of 
claims to polypeptide and polynucleotide sequences, as well as to claims to methods of detecting 
said polynucleotide sequences, where biological activity information has not been proven by 
laboratory experimentation, and they have done so by ignoring perfectly acceptable utilities fully 
disclosed in the specifications as well as well-established utilities known to those of skill in the 
art. As is disclosed in the specification, and even more clearly, as one of ordinary skill in the art 
would understand, the claimed invention has well-established, specific, substantial and credible 
utilities. The rejections are, therefore, improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 

Finally, the enablement and written description rejections based on alleged deficiencies 
pertaining to antibodies which specifically bind the recited variants and fragments of SEQ ID 
NO:l should also be reversed for at least the reasons set forth herein. 



2 "The concept of patentable subject matter under §101 is not 'like a nose of wax which 
may be turned and twisted in any direction * * *.' White v. Dunbar, 119 U.S. 47, 51." (Parker v. 
FlooK 198 USPQ 193 (US SupCt 1978)) 
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Due to the urgency of this matter and its economic and public health implications, an 
expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Commissioner is hereby 
authorized to charge Deposit Account No. 09-0108. 

This brief is enclosed in triplicate. 



3160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 

Enclosures: 

1. Brenner et al., Proc. Natl. Acad. Sci. U.S.A. 95:6073-78 (1998). 

2. John C. Rockett, et. al., Differential gene expression in drug metabolism and toxicology: 
practicalities, problems, and potential . Xenobiotica 29:655-691 (July 1999). 

3. Emile F. Nuwaysir, et al., Microarravs and Toxicology: The Advent of Toxicogenomics . 
Molecular Carcinogenesis 24:153-159 (1999). 

4. Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicology - potentials 
and limitations . Toxicology Letters 112-13:467-471 (2000). 

5. Email from the primary investigator, Dr. Cynthia Afshari to an Incyte employee, dated 
July 3, 2000, as well as the original message to which she was responding. 



Respectfully submitted, 



INCYTE CORPORATION 





Richard C. Ekstrom 
Reg. No. 37,027 

Direct Dial Telephone: (650) 843-7352 



Date: 




113597 



35 



09/828,423 



Docket No.: PF-0505-2 DIV 
APPENDIX - CLAIMS ON APPEAL 

3. An isolated antibody which specifically binds to a polypeptide selected from the 
group consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID NO: 1, 

b) a polypeptide comprising a naturally-occurring amino acid sequence at least 90% 
identical to the amino acid sequence of SEQ ID NO:l, said naturally-occurring 
amino acid sequence encoding a polypeptide having protease inhibitor activity, 
and 

c) an immunogenic fragment of a polypeptide comprising the amino acid sequence of 
SEQ ID NO: 1. 

5. The antibody of claim 3, wherein the antibody is: 

(a) a chimeric antibody; 

(b) a single chain antibody; 

(c) a Fab fragment; 

(d) a F(ab') 2 fragment; or 

(e) a humanized antibody. 

6. A composition comprising an antibody of claim 3 and an acceptable excipient. 
8. A composition of claim 6, wherein the antibody is labeled. 

11. A polyclonal antibody produced by a method of claim 10. 

12. A composition comprising the polyclonal antibody of claim 11 and a suitable carrier. 
14. A monoclonal antibody produced by a method of claim 13. 



15. A composition comprising the monoclonal antibody of claim 14 and a suitable 

carrier. 
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16. The antibody of claim 3, wherein the antibody is produced by screening a Fab 

expression library. 

17. The antibody of claim 3, wherein the antibody is produced by screening a 
recombinant immunoglobulin library. 

20. An isolated antibody of claim 3, which specifically binds to an immunogenic 
fragment having at least 15 contiguous amino acid residues of a polypeptide comprising the 
amino acid sequence of SEQ ID NO:l. 

21. An isolated antibody of claim 3, which specifically binds to a polypeptide comprising 
the amino acid sequence of SEQ ID NO:l. 
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ABSTRACT Pairwise sequence comparison methods have 
been assessed using proteins whose relationships are known 
reliably from their structures and functions, as described in 
the SCOP database [Murzm. A. G„ Brenner, S. E~ Hubbard. T. 
& Cbothia C. (1995) /. Mol. Biol. 247, 536-540). The evalua- 
tion tested the programs BLAST f AJuchul. S. F., Gish, W„ 
Miller, W„ Myers, E. W. & Lipman, D. J. (1990)./. Mol. Biol. 
215, 403-41 0J,\W.BIAST7 (Altscbul. S. F. & Gish, W. (1996) 
Methods EnzymoL 266, 460-480], FAST A [Pearson, W. R. & 
Lipman, D. J. ( 1 988) Proe. Nad. Acad. Set. USA 85.2444-2448], 
and sseaRCH [Smith, T. F. & Waterman, M. S. (1981) /. Mol 
Biol. 147, 195-197) and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percentage identity or raw 
scores. The E-value statistical scores of SSEARCH and Fast a are 
reliable: the number of false positives found in our tests agrees 
well with the scores reported. However, the P-values reported 
by blast and wu-BLASn exaggerate significance by orders of 
magnitude. SSEARCH, fasta letup = 1, and wu-BUSH perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >309e. For 
more distantly related proteins, they do much less well; only 
one-half of the relationships between proteins with 20-30% 
identity are found. Because many bomoiogs have low sequence 
similarity, most distant relationships cannot be detected bv 
any pairwise comparison method; however, those which are 
identified may be used with confidence. 

Sequence database searching plays a role in virtually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's central role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rela- 
tionships are known unambiguously and independently of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant that al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals are linked such that increasing one 
causes the other to be reduced. 
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Sequence comparison methodologies have evolved rapidly, 
so no previously published tests has evaluated modern versions 
of programs commonly used. For example, parameters in 
blast (1) have changed, and wu-BLAST2 (2)— which produces 
gapped alignments— has become available. The latest version 
. of fasta (3) previously tested was 1.6. but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge. 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage identity. Thus, the widely discussed statistical scoring 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question: 
in an absolute sense, how well does pairwise sequence com- 
parison work? That is. what fraction of homologous proteins 
can be detected using modern database searching methods? 

In this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hin- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships in the 
scop: Structural Classification of Proteins database (4), which 
is derived from structural and functional characteristics (5). 
The scop database provides a uniquely reliable set of ho- 
mologs. which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson (6. 7), who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm (8) implemented in ssEarch (3) is the 
oldest and slowest but the most rigorous. Modern heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Intermediate between these two 
is fasta (3).which may be run in two modes of ferine either 
greater speed fktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs. 

To test the methods. Pearson selected two representative 
proteins from each of 67 protein superfamilies defined bv the 
PIR database (9). Each was used as a query to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of pir 

Abbreviation: EPQ. errors per querv. 

T Present address: Department of Structural Biology. Stanford Uni- 
versity. Fairchild Building D-109. Stanford. CA <M 305-5 126 

H*o whom reprints requests should be addressed, e-maii: brcnner(a 
hyper.sunford.edu. 
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superfamiiies. Pearson found thai modern matrices and "in- 
scaling** f raw scores improve results considerably. He aiso 
reported that the rigorous Smith- Waterman algorithm worked 
slightly better than fast a, which was in rum more effective 
than blast. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and fast a. Their test with blast 
considered the abtfiry to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikoffs 
searched the swiss-PROT database (12) and used prosite (13) 
to define homologous families. Their results showed that the 
BLOSUM62 matrix (14) performed markedly better than the 
extrapolated PAM-series matrices (15). which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to find homoloes. But in 
Pearson s and the Henikoffs' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamiiies in pir and prosite are principally 
created by using the same sequence comparison methods 
which are being evaluated. Interdependency of data and 
methods creates a "chicken and egg" problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but pir places them in different superfamiiies 
The problem is widespread: each superfamilv in pir 48.00 with 
a structural homolog is itself homologous to an averaee of 1 6 
other pir superfamiiies (16). ' w 

To surmount these sorts of difficulties, Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a leneth- 
dependent threshold of percentage identity, above which all 
proteins would be of similar structure. A result of this analysis 
was the HSSP equation: it states that proteins with 75% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (i.e., E-values and P-values) based on the 
extreme value distribution (23). Extreme value scoring was 
implemented analytically in the blast program using the 
Karlin and Altschul statistics (22, 23) and empirical ap- 
proaches have been recently added to fasta and ssearch. In 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24, 25), the mathematical trac- 
labthty of statistical scores "is a crucial feature of the blast 
algorithm" ( 1 ). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26, 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24. 25. 
28), there have been no large rigorous experiments on biolog- 
ical data to determine the degree to which such rankines are 
superior. 

A Database for Testing Homology Detection. Since the 
discovery that the structures of hemoglobin and myoglobin are 
very similar though their sequences are not (29),' if has been 
apparent that comparing structures is a more powerful (if less 
convenient) way to recognize distant evolutionary relation- 
ships than comparing sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 
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is very probable that they have an evolutionary relationship 
though their sequence similarity mav be low. 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the scop database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidently. The scop database 
uses structural information to recognize distant homologs, the 
large majority of which can be determined unambiguously 
These superfamiiies, such as the globins or the immunoglobu- 
lins, would be recognized as related bv the vast maioriiv of the 
biological community despite the lack of high sequence sim- 
ilarity. 

From scop, we extracted the sequences of domains of 
proteins in the Protein Data Bank (pdb) (30) and created two 
databases. One (PDB90D-B) has domains, which were all <90% 
identical to any other, whereas (PDB40D-B) had those <40Sc 
identical. The databases were created by first sorting all 
protein domains in scop by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The PDB40D-B database 
contains 1.323 domains, which have 9.044 ordered pairs of 
distant relationships, or -0.5% of the total 1.749.006 ordered 
pairs. In pdbwd-b. the 2.079 domains have 53.988 relation- 
ships, representing 1.29c of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked m both databases by processing with the sec program 
(27) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from http://sss.stanford.edu/ 
sss/. and databases derived from the current version of scop 
may be found at http://scop.mrc-lmb.cam.ac.uk/scop/. 

Analyses from both databases were generaliv consistent, but 
PDB40D-B focuses on distantly related proteins and reduces the 
heavy overrepresentation in the pdb of a small number of 
families (31. 32), whereas pdbwd-b (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distant homolog results here are from PDB*or>B. 
Although the precise numbers reported here are specific to the 
structural domain databases used, we expect the trends to be 
general. 

Assessment Data and Procedure. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a sineie'sequence compar- 
ison algorithm at a nme. we evaluated the effectiveness of 
different scoring schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of statistical scoring. Third, we compared sequence compari- 
son algorithms (using the optimal scoring scheme) to deter- 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of pairwise 
sequence comparison to recognize them. All of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analyses tested blast (1), version 1.4.9MP. and wu- 
BLast: (21 version 2.0a 13MP. Also assessed was the Fasta 
package, version 3.0t76 (3), which provided fasta and the 
ssearch implementation of Smith-Waterman (8). For 
ssearch and fasta. we used BLOSUM45 with gap penalties 
-12/ -I (7, 16) The default parameters and matrix (BLO- 
SUM62) were used for blast and wu- blast? 

The "Coverage Vs. Error'* Plot. To test a particular protocol 
(comprising a program and scoring scheme), each sequence 
from the database was used as a query to search the database. 
This yielded ordered pairs of querv and target sequences with 
associated scores, which were sorted, on the basis of their 
scores, from best to worst. The ideal method would have 
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perfect separation, with all of the homologs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related pairs of sequences consistent with an acceptable 
error rate. 

Our procedure involved measuring the coveraee and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method 
Errors per query (EPQ). an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of quenes. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re- 
cover Operating Characteristic (ROC) plots (33. 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of nonho- 
moiogs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency: that is, it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 
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Fig. 1 Unrelated proteins with high percentage ideniitv Hemo- 
globin 0-chain (pdb code lhds chain b. ref. 38. Left) and cellulase E2 
(PDB code ]tml. ref. 39. Right) have 39% identity over 64 rescues a 
level which is often believed 10 be indicaiive of homology. Despite this 
high degree of identity, their structures strongly suggesi that these 
protems arc not related. Appropriately, neither* the raw alignment 
score of 85 nor the E-value of 13 is significant. Proteins rendered bv 
RASMOL (40). 
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Fic. 3 Length and percentage identity of alignments ol unrelated 
proteins in pdb«od-b: Each pair oi nonhomologous proteins found with 
ssearch is pioned as a point whose position indicates the length and 
the perceniage identity w.thm the alignment. Because alignment 
length and perceniage identity are quantized, manv pairs oi protems 
may have exactly the same alignment length and percentage identity 
The line shows the hssp threshold (though it is intended to be applied 
with a different matrix and parameters). 
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Fig. 4. Reliability of statistical scores in pdbwd-b: Each Ime shows 
the relationship between reported statistical score and actual error 
rate for a different brogratn. E-vaiues are reported for ssEaKCH and 
facta, whereas P-vakes are shown for blast and wu-BLASH. If the 
scoring were perfect), then the number of errors per query and the 
E-vaiues would be the same, as indicated bv the upper bold line 
(P-vaiues should be the same as EPQ for small numbers, and diverges 
at higher values, as indicated by me lower bold line.) E-values from 
ssearch and Facta are shown to have good agreement with EPQ but 
underestimate the significance slightly, blast and wu-blash are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for PD6*or>B were similar to those for pdiwd-b 
despite the difference in number of homologs detected. This graph 
could be used to roughly calibrate the reliability of a given statistical 
score. 

ignored in previous tests but is essential for the straightforward 
or automatic interpretation of sequence comparison results. 
Further, it provides a clear indication of the confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reported bv data- 
base searching programs, if the programs' estimates are accu- 
rate. 

Tbe Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which mav be computed 
in several ways based on either the length of the alignment or 
the lengths of the sequences. The second is a "raw" or 
"Smith-Waterman" score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed bv summing 
the substitution matrix scores for each position in the align- 
ment and subtracting gap penalties. In Blast, a measure 
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related to this score is scaled into bits. Third is a statistical 
score based on the extreme vaJue distribution. These results 
are summarized in Fig. 1. 

Sequence Identity. Though it has been lone established that 
percentage identity is a poor measure (35). there is a common 
rule-of-thumb stating that 30% identity sienifies homology. 
Moreover, publications have indicated thaf25r> identity can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived years ago. are not supported bv present 
results. As databases have grown, so have the possibilities for 
chance alignments with high identity; thus, the reported cutoffs 
lead to frequent errors. Fig. 2 shows one of the manv pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aliened regions 
Despite the high identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so poorlv seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the pdbwd-b analysis in Fig. 3. we learn that 309c 
identity is a reliable threshold for this database onlv for 
sequence alignments of at least 150 residues. Because one 
unrelated pair of proteins has 43.5% identity over 62 residues, 
it is probably necessary for alignments to be at least 70 residues 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found bv 
statistical scoring. If one measures the percentaee identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homoloes are detected 
Use of the hssp equation improves the value of percentage 
identity, but even this measure can find onlv 4% of all known 
homologs at \% EPQ. In short, percentage identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith-Waterman raw scores perform better 
than percentage identity (Fig. 1). but ln-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be verv precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. Howevet 
n is difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affected by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from -raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which are unrelated Most 
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likely, its power can be attributed to its incorporation of more 

measure; it takes account of the 
full substitmi n and gap data (like raw scores) but also has 
details about the sequence lengths and composition and is 
scaled appropriately. 

We find that statistical scores are not onlv powerful, but also 
easy to interpret ssearch and Fasta show close agreement 
between, statistical scores and actual number of errors per 
query (Fig. 4). The expectation value score gives a good 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query Thus an 
E-value ofiO.Ol indicates that roughly one pair of nonhomoioes 
of this similarity should be found in every 100 different queries 
Neither raw scores nor percentage identity can be interpreted 
in this way, ^ and these results validate the suitability of the 
extreme value distribution for describing the scores" from a 
database search. 

The P-values from blast also should be directly interpret- 
able but were found to overstate significance bv more than two 
orders of magnitude for }% EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theorv is 
fundamentally appropriate, wu-blast: scores were more're- 
Iiable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at 1 % EPQ 
Overall Detection of Homologs and Comparison of AJeo* 
rithms. The results in Fig. 5A and Table 1 show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in PDB40D-B 
Even ssearch with E-vaiues. the best protocol tested, could 
find only 18<£ of ail relationships at a l9 c EPQ. blast, which 
identifies 15%, was the worst performer, whereas fasta 
ktup « 1 is nearly as effective as ssearch. Fasta ktup * 2 and 
wu-blast: are intermediate in their ability to deiea ho- 
mologs. Comparison of different algorithms indicates that 
those capable of identifying more homologs are eeneraliv 
slower, ssearch is 25 times slower than blast and 6.5 times 
slower than fasta ktup = 1. wu-blast? is slightly faster than 
Fasta ktup = 2, but the latter has more interpretabie scores 
In PDBWD-B, where there are many close relationships the 
best method can identify only 389c of structurally known 
homologs (Fig. SB). The method which finds thai manv 
relationships is wu-blastz Consequently, we infer that the 
differences between fasta kup = 1. ssearch, and wu-blast- 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great manv such relation- 
ships have no more sequence identity than would be expected 
by chance, ssearch with E-values can recognize >90% of the 
homologous pairs with 30-40<£ identity. In this region, there 
are 30 pairs of homologous proteins that do not have signif- 
icant E-values, but 26 of these involve sequences with <50 
residues. Of sequences having 25-30% identity. 759* f are 
identified by ssearch E-values. However, although the num- 
ber of homologs grows at lower levels of identity, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 
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are detected and only 10% of those with 15-20% can be found 
IHese results show that statistical scores can find related 
proteins whose identify is remarkably low; however, the power 
of the method is restricted by the great divergence of manv 
protein sequences. 

After completion of this work, a new version of pairwise 
blast was released: blastgp (37). It supports gapped align- 
ments, like wu-blast-. and dispenses with sum statistics. Our 
initial tests on blastgp using default parameters show thai its 
t-values are reliable and that us overall detection of homologs 
was substantially better than thai of ungapped blast but not 
quite equal to that of wu-BLAST2. 

CONCLUSION 

The general consensus amongst experts (see refs. 7. 24 25 "7 
and references therein) suggests thai the most effective 'se- 
quence searches are made by (,) using a laree current database 
in which the protein sequences have been complexity masked 
and (r/) using statistical scores to interpret the results. Our 
experiments fully support this view 

Our results also suggest two further pomis. First, the E-val- 
ues reported b> fasta and ssearch eive tairlv accurate 
estimates of the significance of each maich. but the P-values 
provided by blast and wu-blast: underestimate the true 
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extent of errors. Second, ssearch, wu-biasti and fast a 
ktup = 1. perform best, though blast and fasta ktup = 2 
detect most f the relationships found by the best procedures 
and are appropriate for rapid initial searches. 

The homologous proteins that are found bv sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to find the large majority of 
distant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here fail to find a reliable 
match, it does not imply that the sequence is unique; rather, it 
indicates that any relatives it might have are distant ones '* 



"Additional and updated information about this work, including 
supplememaryTtgures. may be found at http://ttsjtanford.edu/gt/. 
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1 . An important feature of the work of many molecular biologists is identifying which 
genes are switched on and off in a cell under different environmental conditions or 
subsequent to xenobiotic challenge. Such information has many uses, including the 
deciphenng of molecular pathways and facilitating the development of new experimental 
and diagnostic procedures. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information available as there appears to be 
almost as many methods of discovering differentially expressed genes as there are research 
groups using the technique. 

2. The aim of this review was to clarity* the main methods of differential gene expression 
analysis and the mechanistic principles underlying them. Also included is a discussion on 
some of the practical aspects of using this technique. Emphasis is placed on the so-called 
'open ' systems, which require no prior knowledge of the genes contained within the study 
model. Whilst these will eventually be replaced by ' closed' systems in the study of human, 
mouse and other commonly studied laboratory animals, they will remain a powerful tool for 
those examining less fashionable models. 

3. The use of suppression- PCR subtractive hybridization is exemplified in the 
identification of up- and down- regulated genes in rat liver following exposure to pheno- 
barbital, a well-known inducer of the drug metabolizing enzymes. 

4. Differential gene display provides a coherent platform for building libraries and 
microchip arrays of 'gene fingerprints* characteristic of known enzyme inducers and 
xenobiotic toxicants, which may be interrogated subsequently for the identification and 
characterization of xenobiotics of unknown biological properties. 
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Introduction 

* 

• It is now apparent that the development of almost all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression in the affected cells 
co mpare d to their normal state (Hunter 1991. Wynford -Thomas 1991. Vogelstein 
and Kinzler 1993, Semenza 1994. Cassidy 1995. riieinjan and Van Hegnmgen 1998). 
Such changes also occur in response to external stimuli such as pathogenic micro- 
organisms (Rohn et aL 1996, Singh et aL 1997, Griffin and Krishna 1998, Lunney 
1998) and xenobiotics (Sewall et aL 1995, Dogra et at. 1998, Ramana and Kohli 
1998), as well as during the development of undifferentiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider-Maunoury et aL 1998). The potential 
medical and therapeutic benefits of understanding the molecular changes which 
occur in any given ceil in progressing from the normal to the 1 altered' state are 
enormous. Such profiling essentially provides a " fingerprint * of each step of a 
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cells dcvel pment c: response and should help in the elucidation of specific and 
sensitive biomarkers representing, for examp.s, different types of cancer or previous 
exposure to certain classes of chemicals that are enzyme inducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well-characterized isoforms of cytochrome P450) are inducible by drugs and 
chemicals in man (Pelkohen et al. 1998), predominantly involving transcriptional 
activation of not only the cognate cytochrome P.450 genes, but additional cellular 
proteins which may be crucial to the phenomenon of induction. Accordingly, the 
development of methodology to identify and assess the full complement of genes 
that are either up- or down-regulated' by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzyme induction 
and how this relates to drug action. Similarly, in the field of chemical-induced 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of which are 
causal and some of which are casually- related to the toxicological phenomenon per 
se. This observation has led to an upsurge in interest in gene-profiling technologies 
which differentiate between the control and toxin-treated gene pools in target tissues 
and is, therefore, of value in rationalizing the molecular mechanisms of xenobiotic- 
mduced toxicity. Knowledge of toxin-dependent gene regulation in target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical industry to harness this technology in the early identification of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
in response to say a testicular toxin that has been well-characterized in vivo could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicity, thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
Whereas it would-be informative to know the identity and functionality of all genes 
up/ down regulated by such toxicants, this would appear a longer term goal, as the 
majority of human genes nave not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination. Such approaches are beginning to gain 
momentum, in that several biotechnology companies are commercially producing 
'gene chips' or 'gene arrays* that may be interrogated for toxicity assessment of 
xenobiotics. These chips consist of hundreds/thousands of genes, some of which are 
degenerate-in the sense that not all of the genes are mechanistically -related to any 
one toxicological phenomenon. Whereas these chips are useful in broad-spectrum 
screening, they are maturing at a substantial rate, in that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
families that contribute to the aetiology and development of chemically-induced 
neoplasias. 

Although documenting and explaining "these genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 
disease progression, the technology is now avtolable-to begin attempting this difficult 
challenge. Indeed, several 'differential expression analysis' methods hav be n 
d veloped which facilitate the identification of gene products that demonstrat 
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Differential gene expression 

02 ' 

altered expression in cells of one population compared to another. These methods 
have been used to .dentin- different gene expression in manv s.tuanons inc ud^ 

ZlZZ ?™ t^" (Zha ° "° L 1 " 8) - " Cdk feS P° nd1 ^ » «tracel.ula 
and intracellular m.crob.al invasion (Duguid and Dinauer 1 990. Raeno « al 1997 

Maldarelh et al. 1998), in chemically treated cells (Syed et al. 199" Rocketer */ 
1999), neop asnc cells (Liang et al. 1992. Chang and Terza e h,-Howe 1998)' 
activated cells (Gurskaya et .7. 1996. Wan et al. ,996). differentiated cells (Hara et 
al 1991. Gujmaraes et al. 199>a. b). and different cell rvpes (Dav, s et al 1984 
Hednck et aL 1984 Xh et al 1998) . Ahhough ^ ^ : xpress|on fl an ;^ 

technolog.es are apphcable to a broad range of models, perhaps rhe.r most ■mportant 
advantage ,s that, in most cases/absolutely no prior knowledge of the specific genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one. w.rh 

several m^lTf ^ * ^ P ° temiaI ^ be "tegonzed into 

several methodological approaches, including: 

(1) Differential screening, 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
ing subtracnon-CCLS. suppression-PC R subtracts hvbndization- 
S>SH, and representational difference analysis — RDA), 

(3) Differential display (DD), 

(4) e^nrelrr e ;l°"" cI " se ' facil '""d -alysis (including serial analysis of gene 
expression-SAGE-and gene expression fingerprinting-GEF) 

P) Oene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

exBl« J bOVC aPPr °.*f heS haVC been used successfully to isolate differentiallv 

Subtle Ln/^" m ,fferCm m ° del SyStCmS - H ° WeVer - each meth ° d has its own 
subtle (and sometimes not so subtle) characteristics which mcur vanous advantages 

and disadvantages. Accordingly, it is the purpose of this rev,ew to clarifv fhe 
mechamstic principles underlying the main different^ expression methods and to 
highlight some of the broader cons.derations and implications of this verv powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called open systems namely those which do not require anv knowledge of gene 

vstem^r * ^ ref ° re - USCfUl f ° r ,S ° ,ating Two 'closed' 

™I Tnvf " ng pre ^ ,ousl >- lden ™ ed ««* sequences). EST analvsis and the 

™nhL? n T\' T" ' tar ** b » c «Y for comoleteness. Whilst 

emphasis will often be placed on suppression PCR subtractive hvbndizat.on (SSH 
the approach employed ,n this laboratory). ,t is the aim of the authors to highlight' 
wherever possible, those areas of common interest to those who use. or intend to us ' 
differential gene expression analysis. 

Differential cDNA library screening (DS) 

h™ 0 !?!! 6 C ,? 7 lopment of mu, "P le technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis 
recognition of the importance of differential gene expression and characterization of 
diff rentially expressed genes has existed for many years. One of the original 

Uavis (1979). These authors developed a method, termed 'differential plaque filter 
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hybridization \ which was used ro isolate galactose-inducible DNA sequences from 
yeast. The theory is simple: a genomic DNA librarv is prepared from normal 
unstimulated cells of the test organism/tissue and multiple filter replicas are 
prepared. These replica blots are probed with radioactivelv (or otherwise) labelled 
complex cDNA probes prepared from the control and test cell mRNA populations. 
Those mRNAs which are differentially expressed in the treated cell population will 
show a posiuve signal only on the filter probed with cDNA from the treated cells. 
Furthermore, labelled cDNA from different test conditions can be used to probe 
multiple blots, thereby enabling the identification of mRNAs which are onlv up- 
regulated under certain conditions. For example; St John and Davis ( 1 979) screened 
replica filters with acetate-, glucose- and galactose-derived probes in order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking in us 
time this method is now considered insensitive and time-consuming as up to 2 
months are required to complete the identification of genes which are differentiallv 
expressed m the test population. In addition, there is no convenient wav to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave rise to a 
search for more convenient methods of analysis. One of the first to be developed was 
SH, numerous variations of which have since been reported (see below). In general 
this approach involves hybridization of mRNA/cDN A from one population (tester) 
to excess mRNA/cDNA from another (driver), followed by separation of the 

expressed) from the hvbridized common 
sequences. This step has been achieved physically, chemicallv and through the use 
of selective polymerase chain reaction (PCR) techniques. 



Physical separation 

Original subtractive hybridization technology involved the physical separation 
of hybridized common species from unique single stranded species. Several methods 
of achieve tins have, been described, including hydroxvapante chromatography 
(bargent and Dawid 1983), avidin-biotin technology (Duguid and Dinauer 1990) 
and ohgodT-latex separation (Hara et al. 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control cells) 
subtractive hybridization followed by hydroxyapatite chromatography, as hydroxy- 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used either for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983. Schneider et al. 1988) or directly as a probe to 
screen a preselected library (Zimmerman et al. 1980, Davis et al. 1984. Hedrick et al. 
1984). A schematic diagram of the procedure is shown in figure 1. 

Less rigorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
encountered with the hydroxyapatite procedure. For example, Daguid and Dinauer 
(1990) described a method of subtraction utilizing biotin-affinity systems as a means 
to remove hybridized comm n sequences. In this process, both the control and 
tester mRNA populations are first converted to cDNA and an adaptor (' oligovector *, 
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or 

Produce clones Label directly and probe library 

Figure 1. The hydroxyapatite method of subtracuve hybridization. cDNA derived from the 
treated /altered (tester) populanon is mixed with a large excess or mRNA from the control idnvert 
population. Following hybridization. mRNA-cDNA hvbnds arr removed bv hydroxvapante 
chromatography. The only cDNAs which remain are those whicn are dinerentiailv expressed m 
the treated/ altered population. In order to facilitate the recovery of full length clones, smaii cDN A 
fragments are removed by exclusion chromatography. The remaining cDNAs are then cloned into 
a vector for sequencing, or labelled and used directlv to probe a librarv, as described bv Sargent 
and Dawid (1983). 

containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction endonuclease. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The digest d 
control population is then biotinylated and an excess mixed with tester cDNA. 
F llowing denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove the Control population, including 
heteroduplexes formed by annealing of common sequences from the t ster 
population. The procedure is repeated several times following the addition of fresh 
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Control (driver) mRNA 
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. — AAAA 
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Anneal mRNA to poiydT» latex beads 



m 




AAAA 

| cONA synthesis 



^TTTT 




Mix and anneal 




AAAA- 



AAAA 



AAAA 



Centrifuge beads, collect and store supernatant 
dissociate potyA, reapply supernatant 



AAAA 



AAAA 



Tester-specific mRNA retrieved after 
4 rounds of hybridization 



cDNA synthesis 
Ugate adaptors and insert into vector 

-■ I _ 

Sequence inserts and/or cany out 
other downstream applications 

Figure 2. The use of oligodT,, Utex to perform ,ubtr«*ive hyt.ridiz.rion. mRNA extracted from the 
control (driver) popuUaon i. converted to anchored cDNA using polydT oligonucleotide* 
J* be * dl - mRStA from the treated/altered (teater) population ia repeatedly 

hybridized again* an excess of the anchored driver cDNA. The final population of mRNA i» 
tester apeonc and can be converted into cDNA for cloning and other downstream application, a. 
described by Han er a/. (1991). 
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control cDNA. In order to further enrich those species differentially expressed in 
the tester cDNA, the subtracted tester population is amplified by PCR following 
every second subtraction cycle. After six cycles of subtraction (three reampiificarion 
steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach, Hara et aL (1991) utilized a method whereby 
oligo(dT 30 ) primers attached to a latex substrate are used to first capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the heteroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-oligotex-dT M forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found in the driver cDNA-oligotex-dT 30 population. These 
tester-specific mRNA species are then converted to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 
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Chemical Cross- Linking Subtraction ( CCLS ) 

In this technique, originally described by Hampson et aL (1992), driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20: 1. The common 
sequences form cDNA:mRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl- 1 ,4-benzoquinone (DZQ). Labelled probes are 
then synthesized from the remaining single stranded cDNA species (unreacted 
mRNA species remaining from the driver are not converted into probe material due 
to specificity of Sequenase T7 DNA polymerase used to make the probe) and used 
to screen* cDNA library made from the tester ceil population. A schematic diagram 
of the system is shown in figure 3 . 

It has been shown that the differentially expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et aL 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols, a major drawback with CCLS is the large amount of starting material 
required (at least 10 /ig RNA). Consequently, the technique has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide primed (DROP) adaptation (Hampson et aL 1996, Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
synth sized cDNA. Since each primer includ s a T7 polymerase promotor sequenc 
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Table I. The abundance of mRNA species and classes in a typical mammalian cell. 



mRNA 
class 

Abundant 

Intermediate 
Rare 



Copies of . No. of mR NA M ean 



each 

species/cell 

12000 
300 
15 



species in 
class 

4 

500 
11000 



of 

each species 
in class 

3.3 
0.08 
0.004 



Mean mass 
(ng) of each 
species /fig 
total RNA 

1.65 
0.04 
0.002 



— Modified from Berrioli et at. (1995). "3' 
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at the 5' end, the final pool of random cDNA -fragments is a PCR-renewable cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Furthermore, if the final pool of 
random cDNA fragments is reamplified using biotinylated T7 primer and random 
hexamer, the product can be captured with streptavidin beads and the antisense 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down-regulated species) between rwo different DROP products. 

Representational Difference Analysis (RDA) 

RDA of cDNA (Hubank and Schatz 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et al. 1993). It is a process of subtraction and 
amplification involving subtractive hybridization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues in the driver are 
rendered unamplifiable, whereas those genes expressed only in the tester retain the 
ability to be amplified by PCR. The procedure is shown schematically in figure 4. 

In essence, the driver and tester mRNA populations are first converted to cDN A 
and amplified by PCR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor ligated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100: 1. Following hybridization, only tester: test r 
homohybrids have 5' adaptors at each end of the DNA duplex and can, thus, be fill d 
in at both 3' ends. Hence, only these molecules are amplified exponentially during 
the subsequent PCR step. Although tester : driver heterohybrids are present, th y 
only amplify in a linear fashion, since the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have no 
adaptors and, therefore, are not amplified. Single stranded molecules are digested 
with mung bean nuclease before a further PCR-enrichment of the tester : tester 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further two or three times using an increasing excess of 
driver (Hubank and Shatz used a tester : driver ratio of 1:400, 1:80000 and 
1 : 800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are ligated to the tester between successive rounds of hybridization and 
amplification to prevent the accumulation of PCR products that might interfere with 
subsequent amplifications. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel. 

__The main advantages of RDA are that it offers_a reproducible and sensitive 
approach to the analysis of differentially expressed genes. Hubank and Schatz (1 994) 
reported that they were able to isolate genes that were differentially expressed in 
substantially less than 1 % of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion are required. The procedure is, therefore, lengthier than many oth r 
differential display approaches and provides more opportunity for operator-indue d 
error to occur" Although the generation of false" positives has been noted, this has 
been solv d*to some degree by O'Neill and Sinclair ( 1 997) through the use of HPLC- 
purified adaptors. These are free of the truncated adaptors which appear to be a 
major sourc of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LCS) was described by Yang and Sytowski (1996). 
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Figure 4. The representational difference analysis (RDA) technique. Driver and tester cDNA are 
digested with a 4^utter restriction enzyme such as Z^iII. The I" set of 12/24 adaptor strands 
(oligonucleotides) are ligated to each other and the digested cDNA products. The I2mer is 
subsequently melted away and the 3'ends filled in using Taq DNA polymerase. Each cDNA 
population » then amplified using PCR. following which the 1" set of adaptors is removed with 
A second set of 12/24 adaptor strands is then added to the amplified tester cDNA 
population, after which the tester is hybridized against ~a Iarge~exces* of driver. The 12mer 
adaptors are melted and the 3' ends filled in as before.-PCR is earned out with primers identical 
to the new 24mer adaptor. Thus, the onlr hybridization products which are exponentially 
amplified are those which are tester: tester combinations- Following PCR, ssDNA products are 
removed with mung bean nuclease, leaving the 'first difference product*. This is digested and a 
third set of 12/24 adaptors added before repeating the subtraction process from the hybridization 

?Soi\ 1 ^"V* rc P ealed 10 the 3 " OT 4 * difference product, as described by Lisitsyn it al. 
(1993) and Hubank and Schatz (19*4)* ~ - _ * 
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Suppression PCR Subtraciive Hybridization (SSH) 

The most recent adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et al. (1996) and Gurskaya et aL (1996). 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few. copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5). 

In SSH, excess driver cDNA is added tcrrwo portions of the tester cDNA which 
have been Iigated with different adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages. 
Equalization occurs since reannealing is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization (James and Higgms 
1 985). The two primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondary hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PGR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation/ transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboratory suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
actually demonstrate differential expressiorrm the tester popuiarion. the number or 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding. However, it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate" a" screening step that differentiates 
different products of different sequences but of the same size (HA-staining, s e 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alternative (or even complementary) approach "is to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes. SSH 
has been used in this laboratory to begin characterization of the short-term gene 
expression profiles of enzyme-inducers such as phenobarbital (Rockett et aL 1997) 
and Wy-14,643 (Rockett et ai. unpublished observations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Figure 6 Flow diagram snowmg method useo m this laboratory to isohKe and idennn- ciones ot *enes 
*h,ch are differentially expressed in rat hver following short term exposure to the enzvme 
inducers, phenobarbital and Wy-14.643. 

of expressed genes which are unique to each compound and time/dose point. Such 
information couJd be useful in short-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a rypical SSH experiment. Subsequent sub-cloning of th 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down- regulated subsequent to xenobiotic 
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^^^^^^ ° t bUin H d f fr ° m m ,,W f °" 0Win 8 treatment with WY- 1 4.643 or 
pnenobarbital. mRNA extracted from control and treated liver* w« u^H r„ u 
differential displavs usina the PCR «UUrr ,nvi t [ caiea used to generate the 
, , . , y * u , g cne r <-*-i>elect cD.NA subtraction kit (C ontech) Lane- I TWh 

oi. ( 1 997). with DemuMion treatment . 6-1 kb ladder. Reproduced trom Rocken « 



exposure, and an almost complete complement of genes are obtained. For example 
the peroxisome proliferator and non-genotoxic hepatocarcinogen VVv,14 643 up-' 
regulates at least 28 genes and down-regulates at least 15 in the rat (a sensitive 
spec.es) and produces 48 up- and 37 down-regulated genes in the guinea pig a 
res,stant spec.es Rockett, Swales. Esda and Gibson, unpublished observations) 
One of these genes. CD81. was up-regulated .n the rat and down-regulated in the 
guinea pig following Wy-14.643 treatment. CD81 (alternative^ named T\P\-l) is 
a widely expressed cell surface protem which is involved in a large number of cellular 
P rTS? c ?, g 3 f d !; eS,0n - a «ivation, proliferation and differentiation (Lew et 
•' , * , thCSe functions are alt «ed to some extent m the phenomena 
uTT OT T y non "? enot0 ^ hepatocaxcmogenesii. ,t >, mm«u«g. and 
probably mechanistically-relevant, that CD81 expression ,s dirTerennallv regulated 
m a resistant and susceptible spec.es. However, the dowr-s.de of this approach is 
that the majority of genes can be sequenced and matched to database sequences but 
the latter are predominantly expressed sequence tags or genes of completely 
unknown function, thus partially obscunng a realistic overall assessment of the 
cntical genes of genuine biological interest. Notwithstanding the lack of complete 
funoond identification of altered gene expression, such gene prcr. ng studies 
essentially provides a 'molecular fingerprint' in response to xenobiotic challenge 
thereby serving as a mechanistically- relevant platform for further detailed 
investigations. - 



Differential Display (DD) 

Originally described as • RNA flngerprinting.by.acbitrarily primed PCR ' (Liang 
and Pardee 1992) this method is now more commonly referred to as 'differential 
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Genes up-regulated in rat liver following 3-day exposure to phenobarbital. 


Band number 






(approximate 






size in bp) 


tiiniiantv 


P A Q^* A _ IS 1 nan* tiftr^rtrvn 


5 (1300) 


93.5 ° 0 


CVP2B1 


7 (1000) 


95.1% 


Preproalbumtn 






Serum albumin mRNA 


8 (950) 


98.3 % 


NCI-CGAP-Prl H. lapiens (EST) 


10 (850) 


95.7% 


CYP2B1 


lit oUU; 


Clone 1 94.9 ° 0 


CYP2BI 




Clone 2 75.3 % 


CYP2B2 


12 (750) 


93.8 ° 0 


TRPM-2 mRNA 






Sulfated glycoprotein 


15 (600) 


92.9% 


Preproalbumin 






Serum albumin mRNA 


16(55) 


Clone 1 95.2% 


CYP2B1 




Clone 2 93.6% 


Haptoglobuiin mRNA partial alpha 


21 (350) 


99.3 % 


18S. 5.8S& 28S rRNa 



Bands 1-4, 6, 9, 13, 14, and 17-20 axe shown to be false positives by dot blot anaylsis and, therefore, 
are not sequenced. Derived from Rockett et al. (1997). It should be noted that the above genes do not 
represent the complete spectrum of genes which are up-regulated in rat liver "by phenobarbital. but 
simply represents the genes sequenced and identified to date. 



Table 3. Genes down- regulated in rat liver following 3-day exposure to phenobarbital. 



Band number 
(approximate 
sue in bp) 



Highest sequence 
similantv 



FASTA-EMBL gene identification 



1 (1500) 
2 (1200) 
3 (1000) 
7 (700) 



8 (650) 
9(600) 

10(530) 

11 (525) 

12 (375) 

13 (23) 



Clone 1 
Clone 2 
Clone 3 
Clone 1 
Clone 2 
Clone 1 
Clone 2 



7% 



Clone I 
Clone 2 
Clone 3 



14(170) 
15 (140) 
Other*: (300) 
(275) 



95.3% 
92.3 % 
91 

77.2° 
94.5° 
91.0° 
86.9° 
96.2° 
86.9° 
82.0° 
73.8° 
95.7° 
100.0° 
97.2 ^ 
100.0° 
100.0° 
96.0 
97.3 
96.7 
93.1 



O' 

o 



O/ 

o 



3-oxoacyl-CoA thiol ase 
Hemopoxin mRNA 
Alpha-2u-globuhn mRNA 
M. musculus Cl inhibitor 
Electron transfer flavoprotein 
A/, musculus Topoisomerase 1 (Topo 1) 
Soares 2Nb\IT M musculus (EST) 
Alpha-2u-globulin (s-type) mRNA 
Soares mouse NML M. musculus (EST) 
Soares p3NMF 19.5 M. musculus (EST) 
Soares mouse NML M. musculus (EST) 
NCI-CGAP-Prl H. sapiens (EST) 
Ribosomai protein 

Soares mouK cmbno NbME135 (EST^t 

Fibrinogen B-6eia-cnam 

Apoiipoprotem E gene 

Soares P 3NMF19.5 M musculus (EST) 

Stratagene mouse testis (EST) 

R. norvegicus RASP I mRNA 

Soares mouse mammary gland (EST) 



EST «■ Expressed sequence tag. Bands 4—6 were shown to be false positives by dot blot analysis and, 
therefore, were not sequenced. Derived from Rockett et al, (1997). It should be noted that the above genes 
do not represent the complete spectrum of genes which are down- regulated in rat liver by phenobarbital, 
but sinuply represent* the genes sequenced and identified to date. 



display' (DD). In this method, all the mRNA species in the control and treated cell 
populations are arnplifi d in separate reactions using reverse transcriptase-PCR 
imedPCR* (Liang (RT-PCR). The products are then run side-by-side on sequencing gels. Those 

to as differential - bands which are present in on display only, of- which are much more intense in one 
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display compared to the other, are differentially expressed and mav be recovered for 
further characterization. One advantage of this system is the speed with which it can 
be earned out— 2 days to obtain a display and as little as a week to make and identitv 
clones. 

Two commonly used variations are based on different methods of primirig the 
reverse transcription step (figure 8). One is to use an oligo dT with a :-base - anchor' 
at the 3'-end, e.g. 5' (dT u )CA 3' (Liang and Pardee 1992). Alternative^, an 
arbitrary primer may be used for 1st strand cDNA syn thesis (Welsh et al. 1992). 
This variant of RNA fingerprinting has also been called 'RAP' (RNA Arbitrarilv 
Pnmed)-PCR. One advantage of this second approach is that PCR products mav be 
derived from anywhere in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that are not polyadenylated. such as manv bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse transcription and 
denaturation, second strand cDNA synthesis is carried out with an arbitrarv primer 
(arbitrary primers have a single base at each position, as compared to' random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR, thus, produces a series of products which, depending on the svstem (primer 
length and composition, polymerase and gel system), usuallv includes 50-100 
products per primer set (Band and Sager 1989). Wh en a combination of different 
dT-anchors and arbitrary primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analvsed 
side by side on a polyacrylamide gel. differences in expression can be identified'and 
the appropriate bands recovered for cloning and further analysis. 

Although DD is perhaps the most popular approach used todav for identifying 
differentially expressed genes, it does suffer from several perceived disadvantages: 

(1> Io^ ay haVC 3 Str ° ng b ' aS towards hi * h c °Py number mRNAs (Bertioli et al. 
199a). although this has been disputed (Wan et al. 1996) and the isolation of very- 
low abundance genes may be achieved in certain circumstances (Guimeraes et 
al. 1995a). 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 
(often the 3'-untranslated region), although this mav not alwavs be the case 
(Guimeraes et al. 1995a). Since the 3' end is often not included m Genbank and 
shows variation between organisms. cDNAs identified-bv DD cannot alwavs be 
matched with their genes, even if they have been identified. 

(j) The pattern of differential expression seen on the displav orten cannot be 
reproduced on Northern blots, with false positives arising in up to 70 ° 0 of cakes 
(Sun et al. 1994). Some adaptations have been shown to reduce false positives, 
including the use of two reverse transcriptases (Sung and Denman 1997)* 
comparison of uninduced and induced celts over a time course (Burn et al 1 994) 
and comparison of DDPCR-products from two uninduced and two induced 
.. lines (Sompayrac et al. 1995), The latter authors also reported that the use of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

F - rthe . r details ^ the bac 'kg«>»"Ml, strengths and weaknesses of the DD 
technique^can be obtained from a revlewTJyTvIcCIelland et al. (1996) and from 
articles by Liang et al. (1995) and WarTer al. (1996)7~ ~ 
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cDNA can now be amplified by PCR using original pnmer oair 

Figure 8. Two approaches to differential display (DD) analysis. I* strand synthesis can be carried out 
either with a polydT u NN primer (where N = G. C or A) or with an arbitrary pnmer. The use ot 
different combinations of G. C and A to anchor the first strand polydT primer enables the priming 
of the majority of polyadenylated mRNAs. Arbitrary primers may hybridize at none, one or more 
places along the length of the mRNA. allowing T l strand cDNA synthesis to occur at none, one 
or more points in the same gene. In both cases. 2 nd strand synthesis is carried out with an arbitrary 
primer. Since these arbitrarv primers for the 2 n * strand may also hybridize to the 1" strand cDNA 
in a number of different places, several different 2 ft ° strand products may be obtained from one 
binding point of the I* 1 strand primer. Following 2*° strand synthesis, the original set of pnmers 
is used to amplify the second strand products, with the result mat numerous Rene sequences are 
amplified. 



Restriction endonuclease-facilitated analysis of gene expression 

Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysts 
(Velculescu et al. 1995). This method uses a different approach to those discussed so 
far and is based on two principles. Firstly, in more than 95% of cases, short 
nucleotide sequences ('tag*-*) of- only* nine or 10 base pairs provide sufnei nt 
information to identify their gene of origin. Secondly, concatenation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
singl clone. Figure 9 shows a schematic representation of the SAGE process. In this 
proc dure, double stranded cDNA from the test cells is synthesized with a 
biotinylated polydT primer. Following -digestion with a commonly cutting (4bp 
recognition sequence) restriction enzyme f/anchoring enzyme'), the 3' ends of the 
cDN A population are captured with streptavwiin beads. The captured population is 
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concatenate, aone ana 
sequence 

AE 



— CATGXXXXXXXXXOOOOOOOOOCATG XXXXXXXXXOOOOOOOOOCATG— 
— GTACXXXXXXXXXOOOOOOOOOGTAC XXXXXXXXXOOOOOOOOOGTAC— 



Tag 1 Tag 2 



Tag 3 Tag 4 



Figure 9 Serial analysts of gene expression (SAGE) analysis. cDNA is cleaved with an anchoring enzyme 
(Afi) and the 3*enda captured using streptavidin beads. Tne cDNA pool is divided in half and each 
pomon hgated to a different tinker, each containing a type IIS restriction site (tagging enzyme. 

fYYYYv nCT10n ™ h tHc ryPC HS cnzyTnc relca »« the linker plus a short length of cDNA 
(XXXXX and OOOOO indicate nucleotides of different tags). The rwo pools of tags are then 
hgated and amplified using I inker -specific primers. Following PCR. the products are cleaved with 
~~ ~ w k dlt9 ** l9oUlcd from the linkera uaing PAGE. The ditags are then ligated (dunng 

which process, concatenizarion occurs) and cloned into a vector of choice for sequencing After 
Vclcuiescu et ai. (1995), with permission. - " _ . _ 
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DNA arrays 

'Open' differential display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm that thev are indeed 
up- or down-regulated m the treated compared to the control tissue. Normally the 
latter process is carried out using Northern blotting or RT-PCR. Even so. eji'ch of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analysis 
of gene expression. These problems will likely be addressed bv the development of 
so-called DNA arrays (e.g. Gress et al. 1992. Zhao et al. 1995. Schena et al. 1996) 
the introduction of which has signalled the next era ,n differential gene expression 
analysis DNA arrays consist of a' gridded membrane or glass chips" containing 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previously proven involvement 
m oncogenesis cell cycling. DNA repair, development and other cellular processes 
They are usually chosen to be as specific as possible for each gene and animal speces. 
Human and mouse arrays are already commercially available and a few companies 
will construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even thousands 
ot genes can be spotted on a single array, and that mRNA/cDNA from the test 
populations can be labelled and used directly as probe. When analysed with 
appropriate hardware and software, arrays offer a rap.d and quantitative means to 
assess differences in gene expression between two cell populations. Of course, there 
can only be identification and quantitation of those genes which are in the arrav 
(hence the term 'closed' system). Therefore, one approach to elucidating the 
molecular mechanisms involved in a particular disease/development svstem mav be 
to combine an open and closed system-a DNA arrav to directly identify and 
quantitate the expression of known genes in mRNA populations! and an' open 
system such as SSH to isolate unknown genes which are differentially expressed. 

One of the mam advantages of DNA arrays is the huge number of gene fragments 
In™ PUt ° n 3 membrane — »nw companies have reported gridding up to 

60000 spots on a single glass "chip' (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced off-the-shelf 
items in the near future. This should facilitate the more rapid determination of 
differential expression in time and dose- response experiments. Aside from their 
high cost and the technical complexities involved in producing and probing DNA 
arrays, the mam problem which remains, especially *v,th the newer micro-arrav 
(gene-chip) technologies, is that results are often not wholly reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 
next few years. 



EST. databases as a means to identify differentially.expressed. genes 
T^F T ^ d Sequence ta » s (ESTs) are partial sequences of clones obtained from 
cUNA libraries. Even though most ESTs have no formal identity (putative 
identification is the best to be hoped for), they have proven to be a rapid and efficient 
means of discovering new genes and can be- used to generate profiles of gene- 
expression in specific cells. Since rheywere first described by Adams et al. (1991) 
there has beena huge explosion in EST production and it is estimated that th re are 
now well over a million such sequences in the public domain, representing over half 
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of all human genes (Hillier et aL 1996). This large number of freely available 
sequences (both sequence information and clones are normally available royalty. free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et aL (1998). The 
approach is simple in theory: EST databases are first searched for genes that nave a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research (TIGR, found at 
http://www.tigr.org) provides many software tools free of charge to the scientific 
community. Included amongst these is the TIGR assembler (Sutton et aL 1995), a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
artificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificity and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et aL (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 

The holistic or single cell approach ? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell rypes in any given specimen. 
For example, a liver sample is likely to contain not only hepatocytes, but also 
(potentially) Ito cells, bile ductule cells, endothelial cei!s A various immune cells (e.g. 
lymphocytes, macrophages and Kuprfer ceils) and fibroblasts. Other tissues will 
eacn nave their own riisnncn ve ceil populations. Aiso. in the case ui neoplastic tissue, 
there are aimost always normal, hyperplastic and /or ayspiastic cells present m a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended 1 target ' cells, e.g. hepatocytes/ neoplastic cells. If 
appropriate, further analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which ceil types are expressing th 
gene(s) of interest. This problem is probably most acute for those studying the 
-differential expression' of genes in the "devetopmenr of different cell rypes, wher 
there is a need to examine homologous cell populations. The problem is now being 
addressed at the National Cancer Instinate (Bethesda, MD. USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme, 
the Cancer Genome Anatomy Project (CGAR) {For more information see web site \ 
http ;/ /www.ncbi.nlm.nih.gov/ncicgap/intro.html). There are also separation tech- 
niques available that utilise cell-specific a~ntigens~as a means to isolate target cells, 
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species present at less than 1.2 % of the total mRNA population-equivalent to an 
intermediate or abundant species. Interestingly, when simple model svstems (single 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 1 0000 x smaller. These results 
are probably best explained by competition for substrates from the manv PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRNAs reported in the literature using 
various model systems provides further evidence that many differenttallv expressed 
mRNAs are not recovered. For example. DeRisi et al. (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-15 000 different mRNA species produced bv anv given mammalian cell 
up to 1000 or more may show altered expression following chemical stimulation 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
' ^ ated / u P re S ulaled >n Jurkat (T-) cells following IL-2 stimulation (UUman et al. 
1990). In addition. Wan et al. (1996) estimated that interferon-v-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by tne cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in usmg DD to compare 
normal and regenerating mouse liver, Bauer et al. (1993) found onlv 70 of 38000 
total bands to be different. Of these. 50% (35 genes) were shown to correspond to 
differentially expressed bands. Chen et al. (1996) reported 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered bv phorbol 
myristate acetate (PMA. a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocvtes of 
allergic disease sufferers. Linskens et al. (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentiallv expressed genes. L'sing SH 
for example. Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. (1995) isolated 17 
genes upregulated in rat liver foiiowine treatment with the oeroxisome proliferator 
clofibrate: Philips et al. (1990) isolated 12 cDNA clones which were upreeulated in 
nighly metastatic mammary adenocarcinoma cell linn compared to poorlv meta- 
static ones. Prashar and Weissman ( 1 996) used 3' restriction fragment analvsis and 
identified approximately 40 genes showing altered expression within' 4 h of 
activation of Jurkat T-cells. Groenink and Leegwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 1 2 to be upregulated. 

In the laboratory. SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome proliferator, WY-14.643 (Rockett, Swales, Esdaile and Gibson, 
unpublished observations). However, these findings have still to be confirmed bv 
analysis of the extracted tissue mRNA for differential expression of these sequences. 

- Whilst the latest differential dispUy^technoIogrerBre-pTlrported to include design 
and xperimental modifications to oyjercomeibiUaek .QLefficiency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 
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experiments and animals. DD U on the other hand, is not subject to this grey 
zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan el al. (1996) reported that differences in expression of 
twofold or more are detectable using DD. 



Resolution and visualization of differential expression products 

It seems highly improbable with current technology that a gel system could be 
developed that is able to resolve all gene species showing altered expression in anv 
given test system (be it SH- or DD-based). Polyacrylamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2 ° 0 (Sambrook et al. 1989) and are 
used as standard in DD experiments. Even so. it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus, 
what appears to be one band in a gel may in fact turn out to be several. Indeed, it has 
been well documented (Mathieu-Daude et al. 1996, Smith et al. 1997) that a single 
band extracted from a DD often represents a composite of heterogeneous products, 
and the same has been found for SSH displays in this laboratory (Rockett et at. 
1997). One possible solution was offered by Mathieu-Daude et al. (1996), who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunatelv, 
high resolution agarose gels such as Metaphor (FMC, Lichfield. UK) and AquaPor 
HR (National Diagnostics. Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2 °' 0 (15-20 base pairs for a 1Kb fragment). Thus, SSH. RDA or other such 
products which differ in size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE— the inclusion of HA-red (10-phenyl neutral red-PEG ligand) or HA-vellow 
(bisbenzamide-PEG ligand) (Hanse Anaiytik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specificallv, 
HA-red and -yellow selectively bind to GC and AT~DNA motifs, respectiv iv 
tWawer et al. 1995. Hanse Anaiytik 1997. personal communication). Since both 
HA-stams possess an overall positive charge, they migrate towards the cathoae 
when an electric field is applied. This is in direct opposition to DNA, which 
is negatively charged and. therefore, migrates towards the anode. Thus, if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hans 
Anaiytik have reported that HA staining is so sensitive that in one case it was us d 
to distinguish two 567bp sequences which-differed by only a single point mutation 
(Hanse Anaiytik 1 996, personal communication). Therefore, if one wishes t ch ck 
whether all the clones produced from a specific band in a differential display 
experiment-are derived from the- same ge ne sp e c ies, a small- amount of reamplifi d 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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in a similar gel containing one of the HA-stains. The standard gel should indicate 
any gross size differences, whilst the HA-stained gel should separate otherwise 
unresolvable species (on standard AGE) according to their base content. Geisinger 
et al. (1997) reported successful use of this approach for identifying DD-denved 
clones. Figure 10 shows such an experiment earned out in this laboratory- on clones 
obtained from a band extracted from an SSH display. 

An alternative approach is to carry out a 2-D analvs* of the differential displav 
products. In this approach, size-based separation is first earned out ,n a standard 
agarose gel. The gel sucr containing the display ,s then extracted and incorporated 
in to a HA gel for resolution based on AT/GC content. 

Of course, one should always consider the possibility of there being different 
gene species which are the same size and have the same GC/AT content. However 

Sf>eC,eS " 0t unresolvab,e given some effon-aga.n, one might use 
&&Cr\ or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band 
either directly on the extracted band (Suzuki et al. 1991) or on the reamplified 
product. 

The requirement of some differential display techniques to visualize large 
numbers of products (e.g. DD and GEF) can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely exceeds 300-400 bands. One approach to 

TJTn™ " R th ' 9 migHt ^'ousrf-.D-geb sucrranhose-tlescribed bv Uitterlinden et 
al. (1989) and Hatada et al. (1991), ... 
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Extraction of differentially expressed bands from a gel can be complex since, in 
some cases (e.g. DD, GEF), the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, -Lohmann et aL (1995) 
demonstrated that silver staining can be used' directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes bv transferring a 
small amount (20-30%) of the DNA from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polvdT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer pn r 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstating 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using sh rt 
wavelength UV (254 run) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstam with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 

The possible use of « micro fingerprinting * to reduce complexity 

Given the sheer number of gene products and the possible complexity of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display— a * sub-fingerprint* or 'micro- 
nngerpnnt*. In this case, -one couid concentrate on those banas vvnich oniy appear 
in a particular chosen size region. Reducing the fingerprint in mis way has at least 
two advantages. One is that it should be possible to use different gel types, 
concentrations and run times tailored exactly to that region. Currentlv, one might 
run products from 1 00-3000 + bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual* 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain t as described earlier. In summary, if a range of gene product siz s 
is carefully chosen to included certain ' relevant ' genes, the 2-D system standardiz d, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular fleets. If the prognosis for exposure to one or more other chemicals which 
.display a similar_prpfile is. already kno_wn. then one cou ld perhaps predict similar 
effects for any new compounds which show a similar micro-fingerprint. 
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bv h i C Fr?TT" regU K tCd , ,n HVCr0f CXP ° SCd t0 W>-W.M3 and was identified 
by a FASTA search as being rransfernn (data not shown). However, transferrin Z 

H643 ?££T7t£ ^ h ; P ° lip,dcmiC proliferate such as W,' 

14.643 (Hem et at. 1996). and this was confirmed with subsequent RT-PCR 
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digestion. This is important for.at least two reasons : 
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wh.ch prevent the formation of appropriate hybrids, espec.allv at the h, K h 

concentrat.ons required for efficient hybridization 
(2) Cutting the cDNAs into small fragments provides better representation of 
md,v,dual genes This is because genes derived from related but distinct 
members of gene families otten have similar coding sequences that mav cross- 
h>bnd 1Z e and be eliminated durmg the subtraction procedure (Ko 1990). 

in re^f L C 'k ? rCnt fn ^ c>mn ^ the Same cDXA ma >' dlff » considerably 
in term of hybridization and amplification and. thus, may not efficient do one 
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as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 
Sequence comparisons also throw up another important pomt-at what degree 
of sequence similarity does one accept a result. Is 90% ,dent,„v between a gene 
derived from your model species and another acceptably close? Is 95% between 
your sequence and one from the same spec.es also acceptable ? Th.s problem is 
particularly relevant when the forward and reverse sequence comparisons give 

™ ar r „T UenC r, S W,tH COm ^y diff ««" «ne An arb.trarv decs, n 

seems to be to allocate genes that axe derin,te <95 % and above s,rruianrv ( and then 
. group those berween 60 and 95 % as being related or possible homologues. 

Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of th 
candidate genes, either as a means of confirming that they are trulv differentially 
expressed, or m order to establish just what the differences are. Northern blot 

2 ST ,S J P ° P ? I aPP u r0 v. Ch M " " relative,y easy " u,ck to P erf °™. However, 
he major drawback with Northern blots is that they are often not sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell are of low 
abundance (see table 1 ). this is a major problem. Consequently. RT-PCR may be the 
-method of choice-forconnrmmrdrffcTemi .il LApitSMUU. A lthough the procedure is 
somewhat more complex than Northern analysis, requiring synthesis of primers and 
ST 0 '! 0 ' re !™° n """iitions for each gene species, it is now possible to set up 
high throughput PCR systems-using mulitchannel pipettes, 96 + -well plates and 



J- C. Rockett et al. 



appropriate thermal cvcling technoioirv Wfciie, - • 

de.ir.bl.. b ,ng more accurate an ^SL ^on ""^ "f^ " m ° re 

monev and time needed to ,W.i ° n 217 ,ntemal s «ndard. the 

«^^^tz^^"rt 7 often 

use of semi-quantitative analvsi* i« s i u hundreds °f gene specyes. The 
must first of al, choo h k* 0 "* ° n < 
compared to the coZls W-l ^ *" d0eS not Cha " ge in the "»« «lta 
exam P ,e in^^5^ F t^ » *«P«. 

glyceraldehyde-3-phosphate dehvd r o«ni e (G^Dh" 7 ( * " ^ '"^ 
hydrofolate reductase (DHFR vi^w A , H> V ° ng " u/ ' 1994 >- di * 
m, Murphv * al 19%i KvL ° f d BUtl " 1991) ' ^-™«oglobulin 
«t 1998) an f number of" tCs fCIo' (HPRT. Fofs 
standard should no ch' ° he " ( C,on Techn iq ues 1997b). Ideally, an internal 

stage i„ the cell cy^t^^^S^^ "» f » of eel, age, 
shown on numerous occasions th„Z> t V f MmulK Hovvever - " has been 
used by the «— 
different tissues (ClonTechn.ques 1997b I, 8 ""^ Conditions *> d » 

Hminary expenments be cS " t ' ^ '^T^ ** ^ 

their suitability for use ,n the mode! svstem housek «P«ng gene, to establish 

gam ms.ght into whv two d5e^ 1 ^ rB t?TS? eXPreSS '° n ^ PCrhapS 
For example rats and mi-T " d,fferent wa VS «> external stimuli. 

-*e°f P^lsT^ non-genotox, effects of a w.de 

resistant (Orton « J 1984 J ] i hamsters and pigs are largely 

Makowska et al WlT^ST ^ Lake " fl/ - ,989 > 

compare lists o up -and ^^.ST* " "T^"' ^ r " SOn(S) " "> 
expressed in onlv one ~£ and {TV 0 ° ' " ^""^ th ° Se W ' hich are 
the sa.d gene, m.ght su«« a K * r f baC , kgr0und knowledge of the effects of 

or protean ofco U T h ^ZTs 1 k T K * f° n " gen0t0X,C 

there were one kev e^e Dr ot J , ? " be f " m0 " COm P ,ex - P «haps if 

upregulatedTo timel bv PP S P '* ^ non -*— - effects and it was 

m the rat. hZ^JZ^S^™ T ""^ ° n,y be «P-™««1»«I five times 

gene mav be overlooked W "7 "T " Upre?u4ated - thc «m P orxance of the 

does not necessaruv Zlt l ZT T * ^ Cnan?e ,n «Press,on 

true relevanceTgenT?: ,c HSo^KSS ^ ^ » *« 

and gene Z wh.ch shows onlv TtZlt^X^ * PMt ! CU J ar 

may find that histoncally. gene Y has often 1 CXamin " the htCratUrC ° ne 

fold by a number of J£Z2£^£g&r£ tZ^^ ^ 

appear less significant Hnw^r r ^O-fold increase would 

Probrems S uffig the-diflerenrig d asp^y approach 

- obtamable 

a developmental oroce^ ^ f . P " down - re « ulal e d « test animals/cells in 

pmental process or followmg exposure to given" stimuli. However, it has 



t 



Differential gene expression 



685 



e analvsis is imore 
:ernal standard, the 
is oft n excessive, 
>f gene species^ The 
ively involvedJOne 
nge in the test cells 
tried in the past, ; for 
Heuval et al. 199^4), 
I et al. 1994) t cii- 
nicroglobulin (/?-2- 
ise (HPRT, Foss et 
Ideally, an internal 
^ardless of cell age,\ 
however, it has been 
ing genes currently 
n conditions and in 
herefore, that pre- 
l genes to establish 



become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all differential displav 
techniques have common and/or unique technical problems which precluae the 
isolation and identification of all those genes which show changes in expression. 
Furthermore, there are important genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insertions, or point mutations such as those 
seen m activated oncogenes, tumour suppressor genes and individual polv- 
morphisms. Polymorphic variations, small though they usuallv are. are often 
regarded as being of paramount importance in explaining whv some patients 
respond better than others to certain drug treatments (and, in loeical extension, whv 
some people are less affected by potentially dangerous xenobioucs /.carcinogens than 
others). The identification of such point mutations and naturallv occurring 
polymorphisms requires the subsequent application of sequencing, SSCP. DGGE 
or TGGE to the gene of interest. Furthermore, differential disolav is not designed 
to address issues such as alternatively spliced gene species" or whether an increased 
abundance of mRNA is a result of increased transcription or increased mR.N'A 
stability. 
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Conclusions 

Perhaps the main advantage of open system differencial display techniques is that 
they are not limited by extant theories or researcher bias in revealing genes which are 
differentially expressed, since they are designed to ampiifv all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
eliminating the need to return to the original mRNA and carry out Northern/PCR 
analysis to confirm the result. However, the rapid progress of genome mapping 
projects means that over the next 5-10 years or so, the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use. provide quantitative data, are 
suitable for high throughput analysis and can be tailored to loo* at sDecinc signalling 
pathways or families of genes. Identification of ail the gene sequences in human and 
common laboratory animals combined with improved DNA arrav technology, 
means that it will soon no longer be necessary to try to isolate differentially express d 
genes using the technically more demanding open system approach. Thus, their 
jnain advantage (that of identifying unknown genes) will be largely eradicated. It is 
likely, therefore, that their sphere of application will be reduced to analysis of th 
less common laboratory species, since it will be some time yet before the genomes of 
such animals as zebrafish, electric eels, gerbils, crayfish and squid, for example, will 
be sequenced. 

Of course, in the end the question will always remain: What is the functional/ 
bi logical significance of the identified, differentially expressed genes? One 
p rsistent problem is understanding whether differentially expressed genes are a 
caus or consequence of .the altered state. Furth rmore, many chemicals, such as 
non-genot xic carcinogens, are also mitogens and so genes associated with 
replication will also be upregulated but may have little or nothing t do with the 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 
cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port- In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat arid amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarravs on glass 
involves the use of probes labeled with fluores- 
cent <pr radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled vkth dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe " are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using* custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,1 1,15]. The data are analvzed 
with custom digital image analysis software that de- 
termines for\each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
1 10 J' y east genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines fill 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7 22-24] In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25— 27] . 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
ohgo synthesis on the glass surface by photolitho£- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis 

Fabncation of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex 29,30] r The light from a high-intensitv 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size p9 3 1 3 ■>] 
■ Sample preparation involves the generation of 
double-stranded cDNA from cellular polv(A) + RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinvlated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 

1 v T-: ked stre P tavidi n (e.g., phvcoervthrin) 
after hybridation [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing bv hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCA1 gene [37]. In addition 
mutations m the cystic fibrosis [38] and BRCA1 1391 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene [401 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [331 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [121 
More recently, oligonucleotide chips have been used 
to help .dentify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 
Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassavs 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity 
many of which measure toxicant-induced DNA dam- 
age Examples of these assays include the Ames test 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
^endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

\ We are developing a method by which toxicants 
c^n be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fi^ed toxicity level (as measured by cell survival), 
RNA is^harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microafray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed,' and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 
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tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced bv un- 
known agents in these same model svstems can then 
be compared with the established signatures. A match 
assigns a, putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessarv to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highlv simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 

Treated 
Population 



RNA Isolation 



Cy3 





Reverse 
Transcription 





Mix cDNAs and 
Apply to Array 



DNA *Chip' 



Hybridize Under 
Coverslip 




Figure 1. Simplified overview of the method for sample 
preparation and hybridization to cONA microarrays. For illus- 



trative purposes, samples derived from cell culture are depicted 
although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) ts identified. As depicted, oxidant stressors produce 



consistent changes in group A genes (indicated by red and 

? i r i Cir T eS) ' bUt x not 9roup 8 or C 9 enes (indicated by gray 
I ? 5et 9ene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of anion is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-Iike compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl .0. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and veast, for use in 
toxicology studies. 

Anima) Models in Toxicology Testing 

The toxicology community relies heavilv on the 
use of animals as model systems for toxicology test- 
ing Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxic:ty, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze (43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 



Table 1. ToxChip v1.0: A Human cDNA 
Chip Design d to Detect Responses to 
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No. of genes 
on chip 



Apoptosis 

DNA replication and repair 
Oxidative stress/redox homeostasis 
Perpxisome prolif erator responsive 
Dioxin/PAH responsive 
Estrogen responsive 
Housekeeping 

Oncogenes and tumor suppressor genes 

Cell-cycle control 

Transcription factors 

Kinases 

Phosphatases 

Heat-shock proteins 

Receptor^ 

Cytochrome P450s 



72 
99 
90 
22 
12 
63 
84 
76 
51 
131 
276 
88 
23 
349 
30 



'This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassav is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassav and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours ordays, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression, profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 



755 



NUWAYSIRETAL 



Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
disease? of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recentiv 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the|subject of this discussion, but it is worth- 
while to;note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair\differences makes these arrays uniquelv 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42] 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 20n genes thought to be involved in en- 
vironmental diseases [48]. In a pilot studv on the 
feasibility of tfeis application to the Environmental 
Genome Project, oligonucleotide arravs will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 
There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However> the variety of microarrav 
methods available will create problems of data com - 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems a 
set of standards to be included on all platforrns 
should be established. These standards would facili- 
tate data entrv into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain, to be resolved, but it is clear 
that new molecular techniques such as microarrav 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. f The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All riehts reserved. 
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■ Expression profiling in toxicology 
\ limitations 




1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene reaula- 
-tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 



0378-4274/00/S - see front matter © 2000 Elsevier Science Ireland Ltd. All rights reserved. 
PII: S0378-4274(99)00236-2 



468 



S. Steiner y N.L. Anderson / Toxicology Letters U2-U3 (2000) 467-47} 



2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are yery popular and promise a great potential. 
On atypical array, each gene of interest is repre- 
sented^ either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et aL, 1995; Shalon et ah, 
1996) or\by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support usins 
photolabile nucleotide chemistry (Fodor et aC 
1993; Chee et aL, 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et aL 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent stainins (Fia. ^ 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et aL, 1993) and sequence tags (Wilkins et 
aL, 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 



4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target ampiifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
oftpn poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not very * meaningful', and secreted proteins 
are \likely mere reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins ^has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et ai.. 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991. 
1995, 1996; Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
toring of pre-clinical and clinical studies (Dohertv 
et al., 1998). 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
. by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunitv 
to dramatically improve the predictive power o; 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 



References 

Aicher, L.. Wahl. D.. Arce T A.. Grenet, O.. Steiner. S., 199S. 
New insights into cyclosporine A nephrotoxicity by pro- 
teome analysis. Electrophoresis 19, 1998-2003. 

Anderson, NX., Seilhamer. J., 1997. A comparison of selected 
mRNA and protein abundances in human liver. Elec- 
trophoresis 18. 533-537. 

Anderson. N.L.. Esquer-Blasco. R., Hofmann, J.P., Anderson, 
N.G.. 1991. A two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effects studies. 
Electrophoresis 12, 907-930. 

Anderson, L.. Steele, V.K., Kelloff, G.J., Sharma, S., 1995. 
Effects of oltipraz and related chemoprevention com- 
pounds on gene expression in rat liver. J. Cell. Biochem. 
Suppl. 22. 108-116. 

Anderson. N.L., Esquer-Blasco, R.. Richardson, F., Fox wor- 
thy, P.. Eacho. P., 1996. The effects of peroxisome prolifer- 
ators on protein abundances in mouse liver. Toxicol. AppL 
Pharmacol. 137, 75-89. 
Arce, A., Aicher, L., Wahl, D., Esquer-BIasco, R., Anderson, 
N.L., Cordier. A., Steiner, S., 1998. Changes in the liver 
proteome of female Wistar rats treated with the hypo- 
glycemic agent SDZ PGU 693. Life Sci. 63, 2243-2250. 



S. Sterner, N.L. Anderson / Toxicology Letters 112-113 (2000) 467-471 



471 



Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., 
| Stem, D M Winkler, J., Lockhart, DJ. ? Morris^ M.S., 
[Fodor, S.P., 1996. Accessing genetic information with 
^high -density DNA arrays. Science 274, 610-614. 

Doherty, N.S., Littman, B.H., Reilly, K., Swindell, A.C, Buss f 
J., Anderson, N.L., 1998. Analysis of changes in acute- 
phase plasma proteins in an acute inflammatory response 
and in rheumatoid arthritis using two-dimensional gel elec- 
trophoresis. Electrophoresis 19, 355-363. 

Fodor, S.P., Read, J.L., Pirrung, M.C., Stryer, L.. Lu, A.T., 
Solas, D., 1991. Light-directed, spatially addressable paral- 
lel Vhemical synthesis. Science 251, 767-773. 

Mann,^., Hojrup, P., RoepsdorfT, P., 1993. Use of mass 
spectrometry molecular weight information to identify 
proteins in sequence databases. Biol. Mass Spectrom. 22. 
338-345. 

Richardsok F.C., Strom, S.C, Copple, D.M., Bendele. R.A.. 
Probst,t G.S., Anderson, N.L., 1993. Comparisons of 
protein changes in human and rodent hepatocytes induced 
by the rat-specific carcinogen, methapyrilene. Elec- 
trophoresis 14, 157-161. 



Schena, M., Shalon, D., Davis, R.W., Brown, P.O., 1995. 
Quantitative monitoring of gene expresssion patterns with 
a complementary DNA microarray. Science 251, 467-470. 

Shalon, D., Smith. S.JL Brown, P.O.. 1996. A DNA microar- 
ray system for analyzing complex DNA samples using* 
two-color fluorescent probe hybridization. Genome Res. 6, 
639-645. 

Steiner, S., Wahl, D., Mangold, B.L.K.; Robison, R., Rayr- 
nackers, J., Meheus, L., Anderson. N.L., Cordier, A., 
1996a. Induction of the adipose differentiation-related 
protein in liver of etomoxir. treated rats. Biochem. Biophys. 
Res. Commun. 218, 777-782. 

Steiner, S., Aicher, L., Raymackers, J., Meheus, L., Esquer- 
Blasco. R., Anderson, L.. Cordier, A., 1996b. Cyclosporin 
A mediated decrease in the rat renal calcium binding 
proiein calbindin-D 28 kDa. Biochem. Pharmacol. 5K 
253-258. 

Wilkins. M.R., Gasteiger, E., Sanchez, J.C., Appel. R.D., 
Hochstrasser, D.F.. 1996. Protein identification with se- 
quence tags. Curr. Biol 6, 1543-1544. 



Docket No.:Pm)505. 2 div 

o l. rw USSN: 09/828,423 

Subject: RE: [Fwd: T xicology Chip] Enclose No . s 0 fj_ 

Date: Mon. 3 Jul 2000 08:09:45 -0400 
From: "Afshari.Cymhia" <afshari <5-niehs.nih.gov> 
To: ""Diana Hamlet-Cox*" <dianahc(S'incvie.com> 



t; 



You car. see the list of clones that we have or. our 1 2 }\ :-;r a- 
* r>&"u€ » . r.»6^s . "in . cc*.* ir«aps cu esz 'cl"*"** cw "**— — • 

W selected a suoset of genes (2000K) "thaT we'believed -~ 

response jane basic cellular processes ar.c added a set e --*-"-« 1-^"--- 

Z r~*-J* have deluded a se; of control genes (80-) that'we"-- s - : "e--I-*~" 
tne 1^3?..;oe=ause tney did nor change across a laroe se- e* a--!C- 
experiments. However, we have found that some of these "ae-es'--^—^ 
signncantuy atter tox treatments and are ir. the process" cf ... 
variation or eacn o: tnese 80- genes across our experiments " * "* 
Our cmps are constantly changing and being updated and we hcoe --a- c — 
aata v-_ _eaa us to wnat the toxchip should reallv be 
- nope tms answers your question. 
Cindy Afshari 



> From: Diana Hamlez-Cox 

> Sear; Monday, June 26, 2000 8:52 PM 

> To: a fshari&niehs. nih.gov 

> Subject: [Fwd: Toxicology Chip] 



> 

> Dear Dr. Afshari, 



> 



> Since I have noz yez had a response from Sill Grigg. pe-ha 3 c hm „„ c 

> the right person zo conzacz. pe.naps he was noz 

> 

> Can you help me in this matzer? I don't need to know zhe sequences 

> ^!T ar '' y ' -* W ° Ujd Uke VBTy much " *>ow whaz zypes o^seoZe'-ces 
» are vemg usee, e.g.. GPCRs (more specific?,, ion channels etc 

> Diana Hamlet -Cox 



> 
> 



Original Message 

> Subject: Toxicology Chip 

> Daze: Mon. 19 Jun 2000 28:31:48 -0700 

> From: Diana Hamlet-Cox <dianahc9incyze . com> 

> Organization: Incyte Pharmaceuticals 

> To: griggGniehs.nih.gov 

> Dear Colleague: 
> 

> Z am doing lizerazure research on zhe use of expressed oenes as 

> ^^nlTZ 1001 /^ m V rkerS ' ^ f ° Und Zhe ^s7 Release dazed .etruary 

> tao W ™ rVL 9 ding ZhB Wek ° f Che KI ~' S ia — * - would like zo 

> know - there is a resource I can access tor you could provide-, Alt 

> ,Kic.oa..ay. in particular. I am interested in zhe criteria used -o 

> Ini^LoT-^l" ^ TOXCAiP ' iael »*»9 *"y conzrol 'seguences 

> inc-uaea in zhe microarray. 

> 

> Thank you for your assistance in this requesz. 

> Diana Hamlet-Cox. Ph.D. 

> Incyte Genomics, Inc. 
> 



> — 
> 




> crig.rria" message 

^m^m mm 5* l * £Tr * * ■-* *«■ «* «"* **» 

> 
> 



07/? 1/2000 I0 W AM 



Hmrop**rtsu I9W. //- 52^-539 



Reference points for corr.piniors c: I-D ts. rr.ir* 



Bengt Bjellqvist* 
Bodil Basse 
Eydfinnur Olsen 
Julio E. Celts 

) 

Institute of Medical Biochemistry 
and Danish Centre for Human 
Genome Research. Aarhus 
University; Aarhus 



\ 



Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 



A highly reproducible, commercial and nonlinear, wide-range immobilized pH 
gradient (IPG) was used to generate two-dimensional (2-D> gel maps of 
["SJmethionine-labeled proteins from noncultured. unfractionated normal 
human epidermal keraiinocytes. Forty one proteins, common to most human 
cell types and recorded in the human keratinocyte 2-D gel protein database 
were identified in the 2-D gel maps and their isoelectric points ip/1 were deter- 
mined using narrow-range IPGs. The latter established a pH scale that 
allowed comparisons between 2-D gel maps generated either with other IPGs 
in the first dimension or with different human protein samples. Of the 41 pro- 
teins identified, a subset of 18 was defined as suitable to evaluate the correla- 
tion between calculated and experimental p/ values for polypeptides with 
known composition. The variance calculated for the discrepancies between cal- 
culated and experimental p/ values for these proteins was 0.001 pH units. 
Comparison of the values by the r-test for dependent samples (paired lest) 
gave a Hevel of 0.49. indicating that there is no significant difference between 
the calculated and experimental pi values. The precision of the calculated 
values depended on the buffer capacity of the proteins, and on average, it 
improved with increased buffer capacity. As shown here, the widely available 
information on protein sequences cannot, a priori, be assumed to be sufficient 
for calculating p/ values because post-transiational modifications, in particular 
.V-ierminal blockage, pose a major problem. Of the 36 proteins analyzed in 
this study. 18-20 were found to be A-terminally blocked and of these only 6 
were indicated as such in databases. The probability of .V-terminal blockage 
depended on the nature of the A-terminal group. Twenty six of the proteins 
had either M. S or A as A'-terminal amino acids and of these 17—19 were 
blocked. Only 1 in 10 proteins containing other V-terminal groups were 
blocked. 



I Introduction 

As compared with carrier ampholyte isoelectric focusing 
(CA-lEFh the application of immobilized pH gradients 
(IPGs* in the first dimension in 2-D gel electrophoresis 
offers improved reproducibility |1] because the nature of 
the pH gradient makes the resulting focusing positions 
insensitive to the focusing time [2| and to the type of 
sample applied [3]. The recently introduced ready-made 
IPG strips [4] seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D gel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easily 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed with the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the particular 
IPG used and by which the results can be described. The 
introduction of carbamylation trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing [51. 
Problems associated with the use of carbamylation trains 
were mainly due to lack of temperature control and to 
the use of nonequilibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyla- 
tion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamylation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities but rather a 
large number of differently carbamylated peptides 
having close pi values. As a result, the spots are large 
and poorly defined as compared to the rdtnary polypep- 
tide spots in 2-D gel maps. 
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foardt etaL [6] defined the pH gradient in 2-D gel 
^iments by pi markers whose pi values were calcu- 
from the amino acid composition. Focusing posi- 
of other polypeptides could be predicted from their 
position but the pA* values needed for the pi calcula- 

various groups emploving this 
>ach do npt use the same pK values [6. 7J and there- 
the pi values derived in this way cannot be 
:ted to describe the variation of the hydrogen ion 
ity. In spite of this fact, it is still possible to make 
roximate predictions of focusing positions because 
Ip/f values used to define the pH gradient are also 
to calculate pi values and to predict the focusing 
ions. Errors! in pK assignments are therefore com- 
ated. A pH scale which corretly reflects the variation 
vdrogen i n activity during focusing should improve 
precision of the predictions, but this has never been 
lemented with CA-IEF focusing as a first dimension 
YD geJ electrophoresis. The main reason for this are 
problems associated with pH measurements in 
Ised gels containing high concentrations of urea. 

can be described from the concentration variation 
immobilized groups, provided that the pK values 
se groups are known for the conditions prevailing 
kg focusing. To avoid measurements on gels, Gia- 
b etai. f8] suggested the use of pK values derived by 
Ition of determined pA' shifts. Recently, direct deter- 
nions of pK differences between immobilized 
fps in IPGs were made by determining pApAT values 
veriapping narrow-range IPGs [9, 10] and the results 
fed the applicability of the Gianazza approach. A 
liption of the focusing results in a pH scale, which 
stly describes the variation of the hydrogen ion 
ay for the focusing conditions used, not only allows 
comparison of 2-D gel maps generated with different 
s» but also opens the possibility for correlating the 
sing position of a polypeptide with its composition 
Experiments by Bjellqvist etaL [9. 10] have implied 
pH scales showing good correlation between calcu- 
I and experimental pi values can be derived for any 
le conditions commonly used for focusing in connec- 
with 2-D gel electrophoresis. These pH scales are 
defined through the pK values of the immobilized 
ps in the IPG containing gel. To be useful for inter- 
raiory comparisons, however, the pH scale has to be 
led through pi values of easily recognizable spots 
em in the 2-D gel map. So far, pi determinations in 
eful pH scale, combined with determinations of pK 
es needed for pi calculations, have only been made 
he pH range 4.5-6.5 at 10°C |9]. CA-IEF focusing as 
ribed by OTarrell [11] does not control the tempera- 
of the first dimension, which can be expected to be 
tly above room temperature. With IPGs, the temper- 
commonly used is about 20 °C [4, 12] or 25 °C [13] 
this is a critical parameter that needs to be con- 
;d [14]. 



^resent work was designed to c mpare 2-D gel maps 
ifferent cell types in a laboratory applying both 
EF and IPG focusing at a common temperature To 
end we have generated 2-D gel maps of proteins 
noncultured, unfractionated n rmal human epi- 
al keratinocytes with IPG in the first dimension 



and a focusing temperature of 25 C We ha%e usee ;on- 
mercial nonlinear, wide-range IPG strips which six- --b 
gel maps that are closely similar to the ones fesuutnc 
with the CA-IEF technique used to establish the human 
keraunocyte database [15]. As an initial step towards 
mterlaboraiory comparisons of results obtained with the 
nonlinear gradient as a first dimension we report here 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH ranee 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparine 2-D gel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posttranslational 
modifications), in addition to amino acid composition to 
be able to calculate pi values thai correspond to the 
actual experimental values. The pA* values used for the 
calculations are provided and the usefulness of pi predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibility of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 

2 Materials and methods 



2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Multtphor* II electrophoresis chamber 
Immobiime x strip tray. Muitidrive XL programmable* 
power supply. Macrodrive power supplv and Multitemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment described in (15) 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either Immobiline DrvStrip' pH 3-10 NL. 
180 mm or alternatively 160 mm long IPG strips with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. Immobiline. Pharma- 
lyte. Amphohne. GelBond as well as PAG film and the 
ready-made horizontal SDS eels (ExcelGeP XL SDS 
12-14) were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis. MO). 

22 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in [16J. Cells 
were lysed in a solution containing 9.8 m urea. 2% w/v 
NP-40, 100 mvt DTT and 2% v/v Ampholine pH 7-9. 

23 2-D gel electrophoresis 



First-dimensional focusing was performed according to 
Gorg etai [2] with some minor modifications, as de- 
scribed in [9J. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea. 2% w/v CHAPS, 10 
mM DTT and 2% v/v carrier amph lyte mixture. The ear- 
ner ampholyte mixture consisted of 2 pans Pharmalyte 
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-•6.5. 1 pan Ampholine pH 6-8 and 1 pan Pharmalyte 
H 8-10.5. Usually, cathodic sample application was 
scd and the samples were diluted 2-20 limes in a solu- 
on containing 9.8 m urea. 4*o w/v CHAPS. l°o w/v 
>TT and 35 m\i Tris base. For acidic application, the 
ris-basc was substituted with 100 m.M acetic acid. The 
egree of dilution and sample volume (20-100 uL) 
epended on the particular sample and the IPG. and 
hether visualization of the proteins was to be done by 
oomassie Brilliant Blue or silver staining. With the 
ide-range non-linear IPG. 10-30 ug of total protein 
as loaded for silver staining and 100-200 ug for Coo- 
lassie staining. Focusing was done overnight with Vh 
roducts in the r^nge of 45-60 kVh with 160 mm long 
rips and 50-70 liA'h with 180 mm long strips. Solubili- 
uion of polypeptides and blocking of -SH groups prior 
> the second-dimensional run. as well as loading on the 
rcond-dimensional gel was done as described in (9). 
ie stacking gel was ^omitted and 5—10 mm were left at 
ie top of the second-dimensional gel for applying the 
*G strip. The space was filled with electrode buffer con- 
ining 0.5 °o w/v agarose. Casting, running, staining and 
jtorauiography were carried out as described in [15]. 

-i 

4 Experimental determination of pi values 

>c determination of the pA* differences between Immo- 
lineb pA* 4.6. pA* 6.2 and pA' 7.0 necessary for the cali- 
ation of the pH scale at 25 C in 9.8 m urea was done 
described in (9) with the same narrow-range IPGs, 
ic pM scale was defined by setting the pA' Value of 
imobiiinc pA* 4.6 equal to 4.61 [9] and the determined 
i* differences gave the pA* values of Immobilines pA' 6.2 
id pA \0. equal to 5.73 and 6.54. respectively. The pA' 
flerenees found arc in good agreement with values de- 
ed from 1 1 7) and [8) by extrapolation to 9.8 m urea 
nceniration. As in |9). additional narrow-range recipes 
Ac been used for determining p/ values. With narrow- 
nge IPGs extending to pH values higher than the pA* 
lue of Immobiline pA* 7 .0. anodic sample application 
in u^ed with acetic acid added to the sample solution, 
hcruise. cathodic sample application was used with 
e vmic sample bufTer as for wide-range IPGs. 

? Protein compositions used for p/ calculations 

ith the exception of vimcntin. protein compositions 
: from the Swiss-Proi database [18]. For vimeniin. we 
ed the data from [19J. where the amino acid at posi- 
■n 41 is a D instead of a S. Information in the Swiss- 
ot database on phosphorylation has been disregarded 
cause it was known from earlier studies (J. E. Celis. 
published results) that the spots in question corre- 
jnded to the unphosphorylaied forms of the peptides. 



different substituents on the c-carbon were taken m:o 
account. The calculations of pi values uere made with 
the aid of the IPG-maker program [20). 

2.7 pK values used for pi calculations 

For the carboxyl terminal group and internal glutamyl 
and aspartyl residues the same pA' values were used as in 
[9J. For C-terminal glutamyl and aspartyl residues, sep- 
arate pA' values were derived with the aid of the Taft 
equations [9. 21J. The pA' values of histidyl groups were 
calculated from the pi values of human carbonic anhy- 
drase I as in [9J. For A-terminal glycine a pA* value of 
7.50 was used. The pA' shift caused by a substuueni on 
the a-carbon was assumed to be identical with the pA* 
shift the substituent caused for the amino group in the 
amino acid. Le. 22Z pH units were subtracted from the 
p A' values for the amino groups in the amino acids given 
in [22. 23). The approximate pA" value of 9 for the cvs- 
tenyl group was taken from [24J. For tyrosyl and argirivl 
groups we used the pA' values for the amino acids [22. 
231. For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA* value. 
These last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginyl groups were assumed to be 
fully ionized while the ionization of tyrosyl groups were 
neglected. A complete list of the pA* values used is ctven 
in Table 1. 



Table I. pA' Values used for the lonuable groups in peptides 
9.8 m urea. 25 W C 


lonizable 
group 


pA* 


C-ierrmnal 


3.55* 


V-iermtnal 




Ala 


- 50 


Met 


-.00 


Ser 


ft 03 


Pro 


8.3f» 


Thr 


h.s: 


Val 


"44 


Glu 


-70 


Internal 




Asp 


4 05 


Glu 


445 


His 


5.98 


Cvs 


** 


Tvr 


10 


Lys 


10 


Arg 


12 


C* terminal side chain groups 




Asp 


4.55 


Glu 


4.75 



• Calculation f pi values 

r the p/ calculations it was assumed that the same pA' 
ue could be used for an amino acid residue in all 
lypepiides and in all positions in the peptide except 
A- or C-terminally placed amino acids. For the pA' 
ues of the A'-terminal amino groups the effect of the 



2.8 Statistical analysis . 

* ■ 

Statistical comparisons of the experimental and calcu- 
lated pi values were done on an Apple Macintosh Ilsi 
using the statistical package Statistica/Mac, release 3.0b 
(from StatSoft Inc.. Tulsa. Oklahoma). Calculated and 
experimental pi values were compared by the /-test for 
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correlated samples (paired r-test). The normality of pi 
differences was estimated graphically by probability 
plots. The variances of the data presented here and the 
similar data on plasma and liver proteins in [9] were 
compared by the F-test. 

r 

3 Results and discussion 

3.1 Identification of polypeptides and pi determinations 

The 2-D ge) maps of ["S)methionine-Iabeled proteins 
from noncultured. unfractionated normal human keratt- 



nocytes. focused with the nonlinear, utde-rance IPG 
CA-IEF pH gradients in the first dimension. >rvur. 
in Figs. 1 and 2. respectively. The IPG extends to n:^--- 
pH values but therwise the two patterns are \er> wr - 
iter and most of the spots in the IPG pattern jjr. re 
directly related to the corresponding spots in :r.e 
CA-IEF gel. To obtain comparable patterns it *as imrv- 
tant to keep the focusing temperature as similar a> 
possible. Compared to other studies (1—4. 9, 10. 12- UJ. 
we increased the urea concentration in the focusfhg gel 
to 9.8 m because keratins streaked badly in the focusing 
dimension when 8 m urea was used, presumablv due to 
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Ftture J 2-D pel proiein map of | ,J $)methionine-labeled proteins from noncultured. unfractionated normal human keratinocytes focused with 
the nonlinear, wide-range IPG in the first dimension. The position of the 41 proteins analyzed in this study is indicated. 
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ggregates of acidic and basic keratins. An increase in 
irea concentraiion to 9 m or more eliminated these 
treaks: apart from this effect, no other major changes in 
he focusing positions were observed. In Fig. 1 we" have 
ndicated the positions of 41 known proteins from the 
mman keratinocyte 2-D gel database that are most 
ikely common; to most human cell types. The choice 
tas made because these proteins are easy to identify 
nth certainty. With the exception of stratifin (spot 2). 
nvolucrin (spot 14) and keratin 14 (spot 15). which are all 



epithelial markers, these proteins are also present tr. 
human fibroblasts (Fig. 3) and lymphocytes (results no: 
shown), and therefore can be used as landmarks for com- 
paring 2-D gel maps derived from different cell types. In 
Table 2 the 41 proteins are listed together with the:: 
sample spot numbers (SSP) in the human keratinocyte 
protein database and pi values determined in 2-D gel 
maps generated with narrow-range IPGs in the first 
dimension. 
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it i- in me first d.mens.on. The posmon of the 4! proieins analyzed in this study ts mdicaied. 
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3.2 Comparison between the determined and calculated 
p/ values for human keratinocyte proteins 

Thirty six of the 41 proteins listed in Table 2 are found 
m the Sn.ss-Prot database. Contrary to the plasma and 
hver proteins used in [9J. the pi calcuations on™ j£ 
terns used >n this study posed some problems thaT 
reflected the *ay m wh.ch they were characteriz™ The 
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proteins used by Bjellqvist er al. [9] were *~ 
abundant and well-chara«*ri-»»H »i,»J. ir> 
were idemifisrt hi vT^ - ^ d p,asma or th;v 

the nam? „f f 5 'v ,enn ! nal ^"ncing and. there:br ; . 

« uuuj cases Known. The proieins us-ri ir 

stVndSm Ve / U be l en *«««i£d -mern 
sequenong f7] and it is known that A-terminal aceivia- 

non occurs w«h high frequency in eukar^iel 
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According to Brown and Roben [25]. proteins with acety- 
lated A-tcrminals correspond in weight to approximately 
80% of the soluble protein in ascites cells. Based on 
results froth A-terminal sequencing, at least 40% of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3J. The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lymphocyte 
proteins, falls between 60 and 65% <J. Strahler. personal 
communication). Information concerning A'-terminal 
blockage is not normally available, and in the Swiss- Prot 
database; nly 6 of the 36 keratinocyte proteins are speci- 
fied as ^terminally blocked. We have, within the present 
material, defined 18 proteins for which the A r *terminals 
are very likely to be correctly described. Six of these pro- 
teins are listed in the Swiss-Prot database as A-termi- 
nally blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A-termi- 
nally sequenced as liver proteins [3J and the remaining 
eight have Mterminal groups other than M. S and A. i.e. 
A'-terminals for which A'-acetylation is uncommon [26]. 
in Figs. 4 a. B. C and D p/ values calculated from Swiss 
Prot database information are plotted against the expert* 



mentally determined pi values for all the kerat:rvj-.- 
proteins listed in Table 2 and for the IS seisctec r:^- 
teins. as well as for the plasma and liver prote:n> tc.. 
from (9] valid for 10 °C)*. 

The calculations show that without knowledge of the 
status of the A**terminal group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swiss-Prot and stfnilar 
databases. However, for proteins where the \-ierminal 
status is known, we find good correlation between pre- 
dicted and experimental p7 values. When the variance of 
the pi discrepancies and the variance of calculated 
charges at the experimental pi values derived from the 
present data set are compared with the corresponding 



• There are lour plots: iAi the > polypeptides from normal human 
keratmocytes mo corrections!. (B> the 3t> pol\ peptides from Ft£. i \ 
where pi values have been recalculated ibr 12 polypeptides »«h M. 
S and A as A -terminally assumed blocked, based on wjlculatcd 
charge. iC) the 18 selected polypeptides wtth information on the 
V-terminal configuration, and (Di plasma and liver proteins. 



t *■% ***** * 






4 



Mr 



~^Zt. exp * nmenul p/ v4lues * Lin « *« «mg the least squares* criterion. <A> 36 polypeptides from norma! human kerati- 

v COfreci ;° l n 1 SJ cB > j6 P°»VPW«»es (torn Fig. 4A (including ihe 18 marker polypeptides! where p/ values have been recalculated 
assuming A-ierminal blockage: x indicates recalculated p/ values: nucleolar protein B23 is indicated w,ih an arrow. (C) 18 polypeptides with infor- 
mation on Vterminal eonnguration and tD) plasma and liver proteins 
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tes. derived from the data on plasma and liver pro* 
s in |9J (Table 3 ). the present data are found to result 
irger variances for the values of both p/ discrepancies 
calculated charge at the experimental p/ value when 
information bn posttranslational modification is 
:n into consideration. Correction for possible .V-acety- 
>n of 12 polypeptides with M. S and A as A-terminal 
tits in a smaller variance of pi discrepancies, al- 
jgh not significantly different from values derived 
n (9J. whereas the variance of the calculated charge at 
experimental pf value is significantly higher. For the 
elected protein^ the variance for the pi discrepancies 
gnificantiy smaller than for the data in [9J; however, 
corresponding value for calculated charge at the 
srimentaJ pi valile does not improve to the same 
nt. This, we believe, reflects another difference 
teen the two sets of proteins used for the calcula- 
s. Based on spot distributions in 2-D gel maps, the 
of proteins used here has a molecular "weight distri- 
on that is more representative of the patterns ob- 
ed in mammalian cells. In the study by Bjellqvist 
[9J most of the high molecular weight plasma pro- 
; had to be excluded due to their unknown content 
ialic acid which made the proteins analyzed in this 
y heavily biased towards low molecular weight pro- 
i. The buffer capacity of proteins normally increases 
the protein's molecular weight, and the averaee, 
sr capacity of the presently selected proteins with 
med known A'-terminals is 18 charge units/pH unit, 
e the corresponding value for the proteins used in 
s only 9 charge units/pH unit. High buffer capacity 
be expected to improve the agreement between cal- 
led and experimental pi values. Inspection of the 
presented in Table 2 for the polypeptides with 
med known ;V-ierminals verifies the importance of 
buffer capacity. For 8 polypeptides having buffer 
cities higher than 15 charge units/pH unit, the calcu- 
ns in all cases yielded pi discrepancies with absolute 
:s of less than 0.02 pH units. The largest discre- 
y. 0.06 pH units, was observed for annexin 1! and 
min. proteins which have lo* buffer capacity: 0.9 



Reference points for comparisons 



e: 
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and 6.6 charge units/pH unit, respectively. The -rosa- 
bility that the focusing position of a protein with knou n 
composition will fall within a certain distance from the 
calculated pi value therefore cannot be predicted by the 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well. As indicated 
by the decrease of the variance of calculated charges at 
the experimental pi value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keratinocyte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
pi values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused bv the 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which pi 
values can be predicted for polypeptides with hieh buffer 
capacity is better than the precision with which experi- 
mental pi values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the pi and the pA' value of the immobilized 
group with the closest pA'. For the present studv this will 
give pi determinations with a precision varvinc in the 
range of ± 0.02-0.05 pH units [9J. The eood agreement 
observed between the calculated and experimental pi 
values is due to the fact that errors are mainlv system- 
atic and. as discussed in [9], they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined pi values will not necessarily 
reflect the variation of the hydrogen ion activitv during 
the focusing step in an optima! way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslational modifications. Calculated net charee at 
the experimentally found isoelectric point defined in this 
scale will serve as a tool to verify that the polypeptide 
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composition used in the calculation is correct and com- 
plete. Exceptions to this are proteins such as involucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in p/ shifts falling in the 
range of 0.01-0.02 pH units and the effect is that the 
quality of the pH definition - the precision by which pA' 
values used in the calculations are given and the preci- 
si n of experimental p/ values in these cases - will limit 
the possibilities to verify polypeptide compostion based 
on the experimental p/ value. 

Statistical comparison of experimental and calculated p/ 
values was done using the Mest for dependent samples 
and normality of the discrepancies was estimated by 
probability plots. For the 36 proteins, the p-level is 
0.0021. indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most likcl> 
A'-terminal 1 configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the p-level is less than 0.05 - the traditional p-limit of 
statistical significance. For the 18 proteins with a known 
or very likely \-ierminal configuration the Mest gave a 
p-level of 0.49, which verifies that the experimental and 
calculated p/ values are not significantly different. 

Besides showing that p/ values for denatured proteins 
with known compositions can be calculated with a high 
degree of precision from average pA* values, the results 
also provide strong support for the notion that 
,V-terminal blockage heavily depends on the nature of 
the A'-terminal groups [26]. The results seem to indicate 
that with A-terminals other than M. S and A. only a few 
proteins have blocked A'-terminais (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
with M. S and A as A*-ierminal are blocked. After correc- 
tion for the effect of suspected A'-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.11 pH units 
between predicted and determined p/ values (Fig. 4B);- 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in p/ prediction 
and calculation of net charge at the p/ is probably not 
due to deficiencies in the database information but 
instead reflects a shortcoming of the model used lor p/ 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspartic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA' values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctly describe the p/ 
value with almost half of the acidic groups packed 



Table 4. Amino acid sequence of nucleolar phosphoprotein 023 
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together into a highly negatively charged ree^- Th> 
limitation caused by calculations based on averse r.\ 
values does not severely limit the usefulness o: :r.. 
approach since a search through Swiss-Prot snou< 
this type of D/E-rich motif is uncommon, and :ne ev>* 
tence of a highly charged region is immediately apparer.: 
upon inspection of the amino acid sequence.. 

The quality of the information available in databases, 
especially concerning posttrahslational modifications, is 
a major problem when the data is to be used for pi pre- 
dictions. The p-level of 0.043 found for all 36 proteins 
after correction for \-aceiylation. shows that this prob- 
lem is not only limited to A'-terminal blockjgc and the 
very good agreement found for the eighteen pirn pep- 
tides, with assumingly correctly described .V-termmal 
(Fig. 4C). must be regarded as an exception from this 
point of view. A-Terminal blockage is generally the main 
problem in relation to p/ predictions for eukaryotic pro- 
teins. Of the 36 keratinocyte proteins analyzed. IS— 20 
are suspected to be A-terminally blocked te> protetns blo- 
cked according to Swiss-Prot. 12 proteins with M. S or A 
as A'-terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as .V-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
A'-terminal sequencing data derived in connection with 
2-D gel electrophoresis. A'-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A'-terminal. while only 1 in 10 proteins with other 
A'-terminal groups are blocked. The information that the 
frequency of .V-terminal blockage is strongly related to 
the nature of the \-terminaI group will be of some help 
in connection with p/ predictions based on database 
information. However, without information from other 
sources, an uncertainty will always remain as to whether 
the .V-terminal charge should be included in the p/ calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gel protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to most human cell types, have been described in a pH 
scale that allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
posttranslational modifications are available. For poly- 
peptides with a very high buffer capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com- 
positions, especially concerning p sttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easv t obtain, while posttranslational modifications 
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avc been difficult and work-intensive to determine. 
Lecent developments in the field of mass spectrometry 
re fast changing this situation and within the next years 
kC can expect a surge in reliable data in this area. While 
waiting this development, verification of correctness 
nd completeness of available information on polypep- 
ide composition can be provided by experimental p/ 
alues in a pH scale based on the pi values determined 
n this study. So far. our data cover the pH range below 
»H « 7.5. The l?asic pH rang* covered by NEPHGE as 
irst dimension twill be covered in forthcoming work. 

Lcctived December \29. 1995 

\ 

> References 

U) Gianazza. £.. AstruaVTesion. S.. Caccia. P.. Giacon. P.. Quaglia. L.. 

Righetli. P. C. Electrophoresis 198b. *. 76— S3. 
[2J Gorg. A.. Postei. W.. Ounmer. S.. Electrophoresis 1988. v. 551-546. 
f3) Hocnstrasscr. D. F.. Fruitier. S.. Paaue:. N.. Bairoch. A.. Ravier. 

F.. Pasouak. C. Sancnez. J.-C. Ti»»oi. J -D.. Biellqvist. B.. Vargas. 

R.. Appcl. R D.. Hughes. G. J.. Electrophoresis 1992. /*. 992- 

1001 

[4} Imtnnhiftnt On Sinn Kit *»r 2-D Eiectmphoreus: Instructions, Phar- 
mjci;i LKB Bioiechnolocv \B. t'ppsala 1993. 

j5I Anderson. V U. Hitkmart. B. J.. Anut. Biocnem. 1979. vA 312-320. 

jbj Ncidhard:. F C Appieov. D A.. Sankar. P.. Hution. M. E.. Phil- 
hps. T. A.. Eiectmpnorvus 198**. lib— 121. 

[*1 Ra>musscn. H. H.. Damme. J V. Pu\pe. M.. Gesser. B.. Celis. 
J E.. VandckcrcLhove. J.. Eieet'opnorexii 1992. /*. 960-969 

[Sj Gianazza. E.. Art ant. G.. Richciti. P G.. Electrophoresis 1983. 
321-326. 

[9| Bicllqvtst. B-. Hughes. G. J.. PjmjujIi. C. Paquci. N.. Ravier. F.. 
Sanchez. J.-C. Frungcr. S.. Hocrutrasser. D. F.. Electrophoresis 
1993. /•/. 1023-1031. 



1101 B.iellqviiL B.. Pasquali. C. Ravier. C. Sinzntz. J -C . K^-. 

strasser. D. F.. Electrophoresis 1995. 14. 155"-I5p5 
(11) OTinsll. P. H.. / Biol Chem. 1T5. ISO. 40C-402: 
[ 13] Gorg. A.. Bioehem. Soc. Transactions 1995. 2i. 13i— 152. 
113) Hanash. S. M.. Strahier. J. R.. Neel. J. V.. Haiiai. V. Maine-. R . 

Kcira. D.. Zhu. X. X.. Wagner. D.. Gape. D. A.. Watsor.. J. T.. 

Sort. Acad. Set. ISA 1991. 55. fD9-ri5. 
|UJ Gorg. A.. Postei. W.. Fnedncn. C. Katzk. R.. Strahier. J R . 

Hanash. S. M.. Electropnorests 1991. :.\ b53-e5S 
1151 Cehs. J. E.. Rasmussen. H. H.. Olsen. E.. Macsen. P.. Letters. H„ 

Honore. B.. Dejgaard. K.. Gromo\. P.. Hoffmann. H J.. Nieisen. 

M-. Vassilev. A.. Vimermyr. O.. Hao. J.. Cehs. A.. Ba»e. B.. Lau- 

ndsen.J. B.. Ratz. G. P.. Andersen. A. H.. Wjibum. E . Kixrgaard. 

I.. Puype. M.. Van Damme. J.. Delay. B.. Vanaekerckho\e. J.. Eiec- 

tropnorests 1993. 14. 1091-1198. 
(16) Celis. J. E.. Madsen. P.. Rasmussen. H. H.. Letters. H.. Honorc. 

B.. Gesser. B.. Dejgaard. K.. Olsen. E.. Macnussor.. \ . Kul. J.. 

Celis. A.. Lauridsen. J. B.. Basse. B.. Rail. G. P.. Andersen, v. 

Walbum. E.. Brandstrup. B.. Pedersen. P S.. Brandt. N J.. Puyre. 

M.. Van Damme. J.. Vandekerckhove. J.. Electrophoresis 1991. 

802-872. 

[171 Bjellqvist. B.. Ek. K.. Righetti. P. C. Gianazza. E.. Gbrf. A.. 
Posiel. W„ Westermeier. R.. J. Btochem. Btopnvs. Methods 19S2. o. 
al — jja. 

flSt Bairoch. A.. Boeckman. B.. \uetetc Acids Res. 1991. /v. 224^-2249 
1 191 Honore. B.. Madsen. P.. Basse. B.. Andersen. A.. Walbum. E . 

Celis. i. E.. LefTers. H„ Suciete Acids Res. 1990. IS. o692. 
f20J Allland. K.. Electrophoresis 1990. //. 140-14". 
[2 1 j Perrin. D. D.. Dempsey. B.. Serum. E. P.. pka Prvdwtums tor 

Orxomc Acids and Bases. Chapman and Halt Ltd.. London I9S1 
[221 Pernn. D. D.. Dissociation Constants or Onmnn Bows in Atfueox 

Solutions. Butterworths. London 1965. 
[231 Pemn. D. D.. Dissociation Constants ot Organic Bases in Aqueous 

Solutions. Supplement 1972. Butteruonhs. London 1972. 
[24] Allland. K.. Becher. P.. Rossman. L'.. Bicllqvist. B.. Electrophoresis 

1988. 9. 474-485. 
[251 Brown. J. L„ Robert. W. K.. J. BmL Chem. 1976. .V/. 1009-1014. 
[26| Persson. B.. Flinta. C Heine. G.. Jornvall. H.. Eur. J. Btochem. 

1985. IS:. 523-527. 



'.'J- * IS* 



Docket No.: PF-0505-2 DIV 
USSN: 09/828,423 
Ref. No. 6 




Beni amin Lewin 



Oxford New York Tokyo 
Oxford University Press 

1997 



Oxford University Press, Great Clarendon Street, Oxford 0X2 6DP 

Oxford Xew York 
Athens Auckland Bangkok Bogota Bombay Buenos lire* 
Calcutta Cape Town Dares Salaam Delhi Florence Hone Hon' 
Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne 
Mexico City Nairobi Paris Singapore Taipei Tokyo Toronto 

and associated companies in 
Berlin Ibadan 

Oxford is a trade mark of Oxford L niversitr Press 

Published in the United States 
by Oxford Lniversitr Press. Inc.. \ew York 

© Oxford University Press and Cell Press. 1997 

All rights reserved. Ab part of this publication mar be 
reproduced, stored in a retrieval system, or transmitted, in anr 
form or by any means, without the prior permission in writing of Ovford 
imversity Press, llithin the UK. exceptions are allowed in refpect of anr 
fair dealing for the purpose of research or private study, or criticism or 
•evicw, as permitted under the Copyright. Designs and Patents Act. 1988. or 
m the case of reprographic reproduction in accordance with the term* of 

!i°nZ C f ;'l he £°* m ^' Licensing .Agency. Enquiries concerning 

reptoduction outside those terms and in other countries should be *ent % 
the Rights Department. Oxford University Press, at the address above. 

This book is sold subject to the condition that it shall not 
by way of trade or otherwise, be lent, re-sold, hired out. or otherwise 
circulated without the publisher's prior consent in anr form of binding 
or cover other than that in which it is published and without a similar 
condition including this condition being imposed 
on the subsequent purchaser. 

A catalogue record for this book is available from the British Libmn- 

Library of Congress Cataloging in Publication Data 

(Data available) 

Typeset by !!\rern Typesetting Ltd, Bristol 



Printed in The United States of America 



The phenotypic differences thai distinguish the 
various kinds of cells in a higher eukaryole are 
largely due to differences in the expression of 
genes that code for proteins, thai is. those tran- 
scribed by R.\A polymerase II. In principle, the 
expression of these genes might he regulated at 
any one of several stages. The concept of the 
••level of control" implies that gene expression 

an automatic process once it 
has begun. It could be regulated in a gene- 
specific way at any one of several sequential 
steps. We can distinguish (at least) five poten- 
tial control points, forming the series: 

Activation of gene structure 

i 

>*■ 

Initiation of transcription 
I 

Processing the transcript 
I 

Transport to cvloplasm 

i 

Translation of niHY\ 

The existence of the first step is implied In 
the discovery that genes may exist in either of 
two structural conditions. Relative to the state 
"I most or the genome, genes are found in 
an "active" state in the cells in which thev 
are expressed (see Chapter 27). The change of 
structure is distinct from the act of transcrip- 
tion, and indicates that the gene is "transcrib- 
able." This suggests that acquisition of the 
-active" structure must be the first step in gene 
expression. 

Transcription of a gene in the active state is 



controlled at the stage of initiation, that is. bv 
the interaction of R.\'A polymerase with its pro- 
moter. This is now becoming susceptible to 
analysis in the in vitro systems (see Chapter 
2R). For most genes, this is a major control 
point: probably it is the most common level of 
regulation. 

There is at present no evidence for control 
at subsequent stages of transcription in eukarv- 
otic cells, for example, via antitermination 
mechanisms. 

The primary transcript is modified by capping 
at the 5' end. and usually also by polyadenyla- 
tion at the 3' end. Introns must he spliced out 
from the transcripts or interrupted genes. The 
mature RNA must be exported from the nucleus 
to the cytoplasm. Regulation of gene expression 
by selection of sequences at the level of nuclear 
RNA might involve any or all or these stages, 
but the one for whjeh we have most evidence 
concerns changes in splicing: some genes are 
expressed by -means of alternative splicing pat- 
terns whose regulation Controls the Ivpe or pro- 
tein product (see Chapter ;W). 

Finally, the translation or an mR.NA in the cyto- 
plasm can be specifically controlled. There is little 
evidence for the employment of this mechanism in 
adult somatic cells, but it does occur in some 
embryonic situations, as described in Chapter T. 
The mechanism is presumed to involve the block- 
ing of initiation of translation of' some mRNAs by 
specific protein factors. 

But having acknowledged that control of gene 
expression can occur at multiple stages, and 
that production or RNA cannot inevitably be 
equated with production or protein, it is clear 



Chapter 29 



that the overwhelming majority of resulatorv 
events occur at the initiation of transcription. 
Regulation of tissue-specific gene transcription 
lies at the heart of eukaryotic differentiation; 
indeed, we see examples in Chapter 38 in 
which proteins that regulate embryonic devel- 
opment prove to be transcription factors. A reg- 
ulatory transcription factor serves to provide 



common control of a large number of target 
genes, and we seek to answer two questions 
about this mode of regulation: what identifies 
the common target genes to the transcription 
factor; and how is the activity of the transcrip- 
tion factor itself regulated in response to intrin- 
sic or extrinsic signals? 



Response elements identify genes under common 
regulation 



The principle that emerges from characterizing 
groups of genes under common control is that 
they share a promoter element that is recognized 
by a regulator}' transcription factor. An element 
that causes a gene to respond to such a factor 
is called a response element; examples are the 
HSE (heat shock response element), GRE 
(glucocorticoid response element), SRE (serum 
response element). 

The properties of some inducible transcription 
factors and the elements that they recognize are 
summarized in Table 29.1. Response elements 
have the same general characteristics as 
upstream elements of promoters or enhancers. 
They contain short consensus sequences, and 
copies of the response elements found in dif- 
ferent genes are closely related, but not neces- 
sarily identical. The region bound by the factor 
extends for a short distance on either side of 



Table 29.1 Inducible transcription factors bind to 
response elements that identify groups of promoters 
or enhancers subject to coordinate control. 



Regulatory Agent Module Consensus 



Factor 



Heat shock 
Glucocorticoid 
Phorbol ester 
Serum 



HSE CNNGAANNTCCNNG HSTF 

GRE TG GTAC AAATGTTCT Receptor 

TRE TGACTCA AP1 

SRE CCATATTAGG SRF 



the consensus sequence. In promoters, the ele- 
ments are not present at fixed distances from 
the startpoint, but are usually <200 bp upstream 
of it. The presence of a single element usually 
is sufficient to confer the regulatory response, 
but sometimes there are multiple copies. 

Response elements may be located in pro- 
moters or in enhancers. Some types of elements 
are typically found in one rather than the other 
usually an HSE is found in a promoter, while a 
GRE is found in an enhancer. We assume thai 
all response elements function by the san* 
general principle. A gene is regulated by " 
sequence at the promoter or enhancer that ' % 
recognized by a specific protein. The pr° tein 
^functions as a transcription factor needed J° r 
RNA polymerase to initiate. Active protein 
available only under conditions when the g^ lC l> 
to be expressed; its absence means that the 
moter is not activated by this particular cir? 11 *' 
An example of a situation in which ,,1(,n 
genes are controlled by a single factor > s P n> 
vided by the heat shock response. This is C °^ {1 
mon to a wide range of prokaryotes *' ^ 
eukaryotes and involves multiple controls ^ 
gene expression; an increase in temp efil n 
turns off transcription of some genes, turf|S ^ 
transcription of the heat shock S eneS ^\> 



causes changes in the translation of 11,1 ^ 
The control of the heat shock genes illu stn ^ 
the differences between prokaryoti c % 
eukaryotic modes of control. In bacteria* a 
sigma factor is synthesized that direct r 
polymerase holoenzyme to recognize & 
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A two-dimensional gel database of rat liver proteins 
useful in gene regulation and drug effects studies 

A standard two-dimensional (2-D) protein map of Fischer 344 rat liver 
(F344MST3) is presented, with a tabular listing of more than 1200 protein species. 
Sodium dodecyl sulfate (SDS) molecular mass and isoelectric point have been es- 
tablished, based on positions of numerous internal standards. This map has been 
used to connect and compare hundreds of 2-D gels of rat liver samples from a va- 
riety of studies, and forms the nucleus of an expanding database describing rat 
liver proteins and their regulation by various drugs and toxic agents. An example 
of such a study, involving regulation of cholesterol synthesis by cholesterol-lower- 
ing .drugs and a high-cholesterol diet, is presented. Since the map has been ob- 
tained with a widely used and highly reproducible 2-D gel system (the Iso-Dalt* 1 
system), it can be directly related to an expanding body of work in other laborato- 
ries. 
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1 Introduction 

High-resolution two-dimensional electrophoresis of pro- 
teins, introduced in 1975 by OTarrell and others [1-4], has 
been used over the ensuing 16 years to examine a wide va- 
riety of biological systems, the results appearing in more 
than 5000 published papers. With the advent of computer- 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels, and requires a substantial commitment in tech- 
nology and resources. 

Given the long-term effort required to develop a protein da- 
tabase, the choice of a biological system takes on consider- 
able importance. While in virro systems are ideal for answer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rats and mice appear to show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures.This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vs. the well-know n variability of cell cultures, 
the latter due principally to differences in reagents {e.g.. 
fetal bovine serum ). conditions (e.^. pH) and genetic ^evo- 
lution" of ceil lines while in culture. It is also more difficult 
to generate adequate amounts of protein from cell culture 
systems (particularly with attached cells), forcing the inves- 
tigator to resort to radioisotope-based or silver-based stain- 
detection methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than the Coomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in *large" protein samples, they are generally more vari- 
able, more labor-intensive and. in the case of radiographic 
methods, may generate highly "noisy" images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urea/Nonidet 
P-40 (NP-40) solubilization and stained with CBB, which 
has the advantage of being easily reproducible [8]. Finally, 
there remains the question of the ~truthfulness w of many in 
vitro systems as compared t their in vivo analogs; h w 
great are the changes caused by the introduction into a cul- 

0I7W»35/91/1 1 1 1-0907 S3J0+.25/0 
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ture and the associated shift to strong selection Tor growth, 
and how do these affect experimental outcomes? Hence 
the apparent advantages of in vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically, there have been two broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us to understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in vivo, although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either sort of biological system; here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful effects inhibitors of protein synthesis, 
ionophores, polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologisis and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects. It is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic (e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive. Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
should be more plastic than cells differentiated for a pur- 
pose dependent on the action of a limited number of spe- 
cific genes. 

The liver also allows the parallels between -in vitro and in 
vivo systems to be examined in detail. Significant progress 



has been made in the development of mouse, rai and h 
man hepatocyte culture systems, as well as in precision- 
tissue slices. Using such an array of techniques, it i s n.v^ 
ble to assemble a matrix of mammalian systems includ ^ 
mouse and rat in vivo on one level and mouse, rat and 
man in vitro on a second level, and to compare effects sX 
tween species and between systems. This approach alioV 
us to draw informed conclusions regarding the biochemi ■ > 
"universality" of biological responses among the manim >■" 
and to offer some insight into the validity of in vitro " - 
proaches for toxicological screening. We believe this c> 
will be necessary if in vitro alternatives are to achieve u^* 
usage in government-mandated safety testing of drugs, con- 
sumer products and industrial and agricultural chemieaU 

A number of interesting studies have been published usirx 
2-D mapping to examine effects in the rodent liver. A num- 
ber of investigarors have made use of the technique u- 
screen for existing genetic variants [8—11] or induced muu* 
lions [12-14], mainly in the mouse. This work builds on the 
wealth of genetic information available on the mouse anc 
its established position as a mammalian mutation-detec- 
tion system. While some studies of chemical effects have 
been undertaken in the mouse [15— 17], most have used the 
rat [18— 23]. The examination of the cytochrome p-450 sy>- 
tern, in particular, has been carried out almost exclusive!} 
on the rat [24. 25]. 

These considerations lead us to conclude that rodent live: 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper, we 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (pattern F344MST3). In future, this master 
will be supplemented by maps of basic proteins, and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud- 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical; a 
delay of 5—15 min appears to cause no major alteration in 
liver protein composition if the liver pieces are kept col 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogemzer 
(e.g., 15 mLWheaton); 8 volumes of solubilizing solution 

* The solubilizing solution is composed of 2% NP-40 (Sigma), 9 M 
(analytical grade, e.g., BDH or Bio-Rad), 0.5% dithiothreitol (l^j 
Sigma) and 2% carrierampholytes (pH 9-1 1 LKB: these come asa .^j 
stock solution, so 2 % final concentration is achieved by making^ 1 * ^ 
solution 10% 9-11 Ampholine by volume). A large batchofso lu j" ^ 
(several hundred mL) is made and stored frozen at -80 °C in 
sufficient to provide enough for one day's estimated sample P 
lion requirement. The solution is never allowed to become 
than room temperature at any stage during preparation ortha * ntJ jnr 
use, since heating of concentrated urea solutions can produce c 
nanis that covalently modify proteins producing artifactua 
shifts. Once thawed, any unused solubilizer is discarded. 
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ed (i.e., 4 mL per 03 g tissue) and the mixture is ho- 
ized using first the loose- and then then the tight-fit- 
$ glass pestle. This takes approximately 5 strokes with 

4 pestle and is carried out at room temperature because 
H would crystallize out in the cold. Once the liver sample 
thoroughly homogenized in the solubilizer. it is assumed 
it all the proteins are denatured (by the chaotropic effect 
the urea and NP-40 detergent) and the enzymes inacti- 
ted by the high pH (-9.5). Therefore these samples may 
;lept at room temperature until they can be centrifuged 
•frozen as a group (within several hours of preparation). 
ie samples are centrifuged for 6 X 10* £ min {e.g.. 500 000 
gfor 12 min using a Beckman TL-100 centrifuge). The 
.Qtrifuge rotor is maintained at just below room tempera- 
te (e.g., 15-20°C), but not too cold, so as to prevent the 
•ecipitation of urea. The centrifuge of choice is a Beckman 
LrlOO because of the sample tube sizes available, but any 
•tracentrifuge accepting smallish tubes will suffice. When 
l appropriate centrifuge is not available near the site of 
unple preparation, samples can be frozen at — 80 "C and 
lawed prior to centrifugation and collection of superna- 
ints. Each supernatant is carefully removed following cen- 
ifugation and aliquoted into at least 4 clean tubes for stor- 
ge.This is done by transferring all the supernatant to one 
lean tube, mixing this gently (to assure homogeneous 
opposition) and then dividing it into 4 aliquots. The ali- 
uots are frozen immediately at -80°C. These multiple ali- 
uots can provide insurance against a failed run ora freezer 
ireakdown. 

£ ; 

•St?' 

12 Two-dimensional electrophoresis 

iample proteins are resolved by 2-D electrophoresis using 

5 20 X 25 cm Iso-Dali* 2-D gel system ([26-29]; pro- 
duced by LSB and by Hoefer Scientific Instruments, San 
Francisco) operating with 20 gels per batch. All first-dimen- 
iional isoelectric focusing (IEF) gels are prepared using the 
same single standardized batch of carrier ampholytes 
JipH 4-8A in the present case, selected by LSB's batch- 
testing program for rat and mouse database work**). A 10 
iJETsample of solubilized liver protein is applied to each gel, 
and the gels are run for 33000 to 34500 volt-hours using a 
progressively increasing voltage protocol implemented by 
^programmable high-voltage power supply. AnAnge- 
ligue*" computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient slab 
gels in which the top 5 % of the gel is 1 1 %T acrylamide, and 
fSflower 95 % of the gel varies linearly from 1 1 % to 1 8 %T. 

5jis system has recently been modified so as to employ a 
gmmercially available 30.8 %T acrylamide/A^-methyle- 
Wibisacrylamide prepared solution (thus avoiding the han- 
ging of the solid acrylamide monomer) and three addi- 
tional stock solutions: buffer (made from Sigma pre-set 
ftis), persulfate and A r ,MA\A"-tetramethylethylenedi- 
.$zaine (TEMED). Each gel is identified by a computer- 
ggnted filter paper label polymerized into the lower left cor- 
£&of the gel. First-dimensional IEF tube gels are loaded 

is material (succeeding certified batches of which are available from 
ocfer Scientific Instruments) has the most linear pH gradient pro- 
ceed by any ampholyte tested except for the Pharmacia wide range 
which has an unacceptable tendency to bind high-molecular weight 
idic proteins, causing them to streak). 



directly (as extruded) onto the slab gels without equilibra- 
tion, and held in place by polyester fabric wedges (Wed- 
gies", produced by LSB) to avoid the use of hot agarose. 
Second-dimensional slab gels are run overnight, in groups 
of 20, in cooled DALT tanks (10°C) with buffer circulation. 
All run. parameters, reagent source and lot information, 
and notations of deviation from expected results are ente- 
red by the technician responsible on a detailed, multi-page 
record of the experiment. 

23 Staining 

Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 
in covered plastic boxes, with 10 gels (totalling approxima- 
tely 1 L of gel) per box. This procedure (based on the work 
of Neuhoff[30, 31]) involves fixation in 1.5 L of 50% etha- 
nol and 2% phosphoric acid for 2h. three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
methanol, 17% ammonium sulfate and 2 % phosphoric acid 
for 1 h, followed by the addition of a gram of powdered Coo- 
massie Blue G-250 stain. Staining requires approximately 4 
days to reach equilibrium intensity, whereupon gels are 
transferred to cool tap water and their surfaces rinsed to re- 
move any particulate stain prior to scanning. Gels may be 
kept for several months in water with added sodium azide. 
The water washes remove ethanol that would dissolve the 
stain (and render the system noncolloidal. with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advantages 
of this staining approach can be summarized as follows: (i) 
the low, flat background makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 
using laser densitometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g., rat liver) at loadings low 
enough to preserve excellent resolution; and (iii) reprodu- 
cibility appears to be very good: at least several hundred 
spots have coefficients of reproducibility less than 15%. 
This value is at least as good as previous CBB methods, and 
significantly better than many silver stain systems. 

2.4 Positional standardization 

The carbamylated rabbit muscle creatine phosphokinase 
(CPK) standards [32] are purchased from Pharmacia and 
BDH. Amino acid compositions, and numbers of residues 
present in proteins used for internal standardization, are 
taken from the Protein Identification Resource (PIR) se- 
quence database [33). 

2.5 Computer analysis 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner. 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent storage media) and a greyscale video- 
print prepared from the raw digital image as hard-copy 
backup of the gel image. Gels are processed using the Kep- 
ler® software system (produced by LSB), a commercially 
available workstation-based software package built on 
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some of the principles of the earlier TYCHO system [34- 
41]. Procedure PROC008 is used to yield a spotlist giving 
position, shape and density information for each detected 
spot. This procedure makes use of digital filtering, mathe- 
matical morphology techniques and digital masking to re- 
move the background, and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations are 
stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced data.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups (e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate "master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g.. Student's t-iesi, Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (i.e., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinated regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein map, and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components' ana- 
lysis) is performed on data exported to S AS (SAS Institute). 

2.6 Graphical data output 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



ceuticals, ground and mixed with the diet at conrp , 
of 0.075% and 1%, respectively. The high chole" , , tions 
was Purina 5801M-A (5% cholesterol plus 1 % SO din dici 
late in the control diet). Animal work was carried out h < ? 0 ' 
crobiological Associates (Bethesda, M D). Animals w 
climatized for one week on the control diet, fed test ^ ^' 
trol diets for one week, and sacrificed on day 8 Av^ 0 "* 
daily doses of lovastatin and cholestyramine in an'oro ^ 
groups were 37 mg/kg/day and 5 g/kg/day respect? 
based on the weight of the food consumed Liver sam i 
were collected and prepared for 2-D electrophoresis acr h 
ing to the standard liver protocol (homogenization o 
volumes of 9 m urea, 2% NP-40, 0.5 % dithiothreitol 
LKB pH 9-11 carrier ampholytes, followed bv cemr r, " 
tion for 30 min at 80000 X g). Kidney, brain and f 
samples were frozen. Gels were run as described abov 
and the data was analyzed using the Kepler 1 system gJ' 
were scaled, to remove the effect of differences in prote in 
loading, by setting the summed abundances of a large num. 
ber of matched spots equal for each gel (linear scaling) 



3 Results and discussion 

3.1 The rat liver protein 2-D map 

F344MST3 is a standard 2-D pattern of rat liver proteins 
based on the Fischer 344 strain. This pattern was initialed 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat has 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 \lL of solubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant, Fig. 4 the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative to the CPK in- 
ternal p/ standards) and S DS molecular mass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel sys- 
tems more reliably than using p/ measurements expressed 
as pH. A major objective of current studies is the identifica- 
tion of all major spots corresponding to known liver pro- 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse live' 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be 
presented systematically in a later edition of this database* 
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iwc include here a useful series of 22 orienting identifi- 
&bos as an aid to other users of the rat liver pattern (Table 



I Carbamylated charge standards, computed pfs and 
* molecular mass standardization 

r c *" na ve previously shown that the use of a system of close- 
spaced internal p/ markers (made by carbamylaiing a 
asic protein) offers an accurate and workable solution to 
it problem of assigning positions in the p/dimension [32]. 
he same system, based on 36 protein species made by car- 
ainylating rabbit muscle CPK. has been used here to as- 
!gn pfs to most rat liver acidic and neutral proteins. The 
tandards were coelectrophoresed with total liver proteins, 
nd the standard spots added to a special version of the 
aaster pattern F344MST3. The gel A"-coordi nates of all 
iver protein spots lying within the CPK charge train were 
ben transformed into CPK pi positions by interpolation 
jetween the positions of immediately adjacent standards 
Table 1) using a Kepler* vector procedure. 

/has proven possible to compute fairly accurate p/ values 
.or many proteins from the amino acid composition [42]. 
Nt have attempted here to test a further elaboration of this 
lpproach. in which we computed pfs for the CPK standards 
iiemselves, based on our knowledge of the rabbit muscle 
CPK sequence and the fact that adjacent members of the 
iarge train typically differ by blockage of one additional ly- 
sne residue (Table 3). We compared these values to similar 
computed pfs for an additional set of carbamylated stand- 
ards made from human hemoglobin beta chains and a se- 
ries of rat liver and human plasma proteins of known posi- 
tion and sequence (Fig.7,Table4).Tbe result demonstrates 
good concordance between these systems. Two proteins 
show significant deviations: liver fatty-acid binding protein 
(FABP; #1 in Table 4) and protein disulphide isomerase 
(i20 in the table). The FABP spot present on F344MST3 
may represent a charge-modified version of a more basic 
iSrent spot closer to the expected p/, not resolved in the 
IEF/SDS gel. Of particular importance is the fact that, by 
comparing computed pfs of sequenced but unlocated pro- 
teins with the CPK pfs, we can assign a probable gel loca- 
$pn without making any assumptions regarding the actual 
gel pH gradient. This offers a useful shortcut, given the va- 
garies of pH measurement on small diameter IEF gels. We 
||ye used this approach to compute the CPK pf s of all rat 
f&d mouse proteins in the PIR sequence database, as an aid 
rotein identification (data not shown). 




.order to standardize SDS molecular weight (SDS-MW), 
have used a standard curve fitted to a series of identified 
roteins (Fig. 8). Rather than using molecular mass perse, 
*e have elected to use the number of amino acids in the 
Polypeptide chain, as perhaps a better indication of the 
' gth of the SDS-coated rod that is sieved by the second 
ension slab. The resulting values were multiplied by 
(the weighted average mass of amino acids in se- 
enced proteins) to give predicted molecular masses. Be- 
se we use gradient slabs, we have not constrained the fit- 
curve to conform to any predetermined model; rather 
tried many equations and selected the best using the 
gram "Tablecurve w on a PC. The equation chosen was> 
+ bx + c/x*, where y is the number of residues, x is the gel 



^coordinate, a is 5 1 1.83, b is -0.273 1 and c is 33 183801 . The 
resulting fit appears to be fairly good over a broad range of 
molecular mass. 

33 An example of rat liver gene regulation: Cholesterol 
metabolism 

Experiment LSBC04 was designed as a small-scale test of 
the regulation of cholesterol metabolism in vivo by three 
agents included in the diet: lovastatin (Mevacor s ,an inhibi- 
tor of HMG-CoA reductase); cholestyramine (a bile acid 
sequestrant that has the effect of removing cholesterol 
from the gut-liver recirculation); and cholesterol itself. The 
first two agents should lower available cholesterol and the 
third should raise it, allowing manipulation of relevant 
gene expression control systems in both directions. Such 
an experiment offers an interesting test of the 2-D mapping 
system since most of the pathway enzymes are present in 
low abundance, many are membrane-bound and difficult 
to solubilize,and the pathway itself is complex. Approxima- 
tely 1000 proteins were separated and detected in liver ho- 
mogenates. Twenty-one proteins were found to be affected 
by at least one treatment, and these could be divided into 
several coregulated groups. 

3.3.1 MSN 413 (putative cytosolic HMG-CoA synthase) 
and sets of spots regulated coordinately or inversely 

One group of spots (including a spot assigned to the cyto- 
solic HMG-CoA synthase, MSN 413) showed the expected 
increase in abundance with lovastatin or cholestyramine, 
the synergistic further increase with lovastatin and choles- 
tyramine, and a dramatic decrease with the high cholesterol 
diet. Spot number 413 is the most strongly regulated pro- 
tein in the present experiment, showing a 5- to 10-fold in- 
duction after a 1 week treatment with 0.075% lovastatin and 
1% cholestyramine in the diet (Figs. 9 and 10). Its expres- 
sion follows precisely the expectation for an enzyme whose 
abundance is controlled by the cholesterol level; it is pro- 
gressively increased from the control levels by cholestyra- 
mine, lovastatin and lovastatin plus cholestyramine, and it 
sinks below the threshold of detection in animals fed the 
high cholesterol diet. This spot has been tentatively identi- 
fied as the cytosolic HMG-CoA synthase, based on a reac- 
tion with an antiserum to that protein provided by Dr. Mi- 
chael Greenspan at Merck Sharp &Dohme Research Labo- 
ratories. This enzyme lies immediately before HMG-CoA 
reductase in the liver cholesterol biosynthesis pathway, and 
is known to be co-regulated with it. Spot 413 has an SDS 
molecular weight of about 54 000 and a CPK p/ of - 1 1 .4, in 
reasonably close agreement with a molecular weight of 
57300 and a CPK p/ of -15.7 computed from the known se- 
quence of the hamster enzyme [43]. 

Using a classical product-moment correlation test (Kepler 
procedure CORREL), a series of five additional spots was 
found to be coregulated with 413. The level of correlation 
was exceedingly high p> 95%). Two of these, 1250 and 933, 
are at similar molecular weights and approximately one 
charge more acidic than 413 (Fig. 9), indicating that they 
may be covalently modified forms of the 413 polypeptide. 
This suspicion is strengthened by the observation that both 
spots are also stained by the antibody to cytosolic HMG- 
CoA synthase. The remaining three correlated spots appear 
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to compnse an additional related pair (1253 and 1001) of 
around 40 kDa and a single spot (1119) of around 28 kDa 
Because these two presumed proteins are present at sub- 
stantially lower abundances than 413, and because the cyto- 
solic HMG-CoA synthase is reported to consist of only one 
type of polypeptide, they are likely to represent other, very 
tightly coregulated enzymes. A second group of six spots 
was selected based on a regulatory pattern close to the in- 
verse of that for spot 413 (MSN's 34,79, 178, 182,204 347- 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovastatin plus cholestyra- 
mine and the highest level upon exposure to the high-cho- 
lesterol diet. Spots 182 and 79 are highly correlated and lie 
about one charge apan at the same molecular weight* they 
may thus be isoforms of a single protein. The other* four 
spots probably represent additional enzymes or subunits. 

33.2 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
dria] proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no effect with any of the other treat- 
ments (including the combination of lovastatin and choles- 
tyramine; Fig. 12).This result is intriguing because lovasta- 
tin was expected to affect only the regulation of enzvmes of 
cholesterol synthesis, which is entirely extra-mitochon- 
dnal. Three of the spots (235, 134, 144) form a closely- 
packed triad at approximately 30 kDa, and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 
triad. 
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proteins of the putative mitochondrial pathway 
much more variable in their expression in all eroun\ T So 
amination of all the coregulated groups suggests ih , ex ' 
titative statistical techniques can extract a wealth r^"" 
esting information from large sets of reproducible ee ^ '* 
abundance of spots in the 413 coreguiation group for 
pie, shows an amazing level of concordance in their rn 
expression among the five individuals of the lovastat ' 
cholestyramine treatment group. This effect is not d ^ 
differences in total protein loading, since they have air? h° 
been removed by scaling, and since proteins with auhf h rr 
ferent regulation patterns can be demonstrated ( e » f 
13). Such effects raise the possibility that manvgene core 
lation sets may be revealed through the study of a sufr" 
ciently large population of control animals (/ e witho 
any experimental manipulation). This approach exploitm* 
natural biological variation in protein expression instead of 
drug efTects, ofTers an important incentive for the construe 
tion of a large library of control animal patterns 



4 Conclusions 

Because of the widespread use of rat liver in both basic bio- 
chemistry and in toxicology, there is a long-term need for a 
comprehensive database of liverproteins.The rat Iivermas- 
ter pattern presented here has proven to be an accurate re- 
presentation of this system, having been matched to more 
than 700 gels to date. As the number of proteins identified 
and the number of compounds tested for gene expression 
effects grows, we expect this database to contribute valu- 
able insights into gene regulation. Its practical utility in sev- 
eral areas of mechanistic toxicology is already being de- 
monstrated. 

Received September II, 1991 



3.33 An example of an anti-synergistic effect 

A sixth spot (367) shows strong induction by lovastatin 
(two- to threefold), and about half as much induction with 
lovastatin plus cholestyramine, but without sharing the ani- 
mal-animal heterogeneity pattern of the 235-set (Fig. 13) 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastatin and 
cholestyramine. The existence of such an effect demon- 
strates that lovastatin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

3.3.4 Complexity of the cholesterol synthesis pathway 

Taken together, these results suggest that treatment with lo- 
vastatin alone can afreet both cytosolic and mitochondrial 
pathways using HMG-CoA, while cholestyramine, on the 
other hand, either alone or in combination with lovastatin 
produces a strong efTect on the putative cytosolic pathway 
but httle or no efTect on the putative mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 
tin s efTect on levels of HMG-CoA and related precursor 
compounds that are exchanged between the cytosol and 
the mitochondrion, whereas cholestyramine sh uld affect 
only the cytosolic pathways directly controlled by cholester- 
ol and bile acid levels. It remains to be explained why some 
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6 Addendum 1: Figures 1—13 




Figure I. Synthetic representation of the standard rat liver 2-D master pattern, rendered as a greyscale image using a videoprinter. 
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f e2. Sche matte representation of the master pattern ( the same as Fig. 1), useful as an aid in relating specific areas of Fig. 1 and the following detailed 
Irants. 
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Figure 3. Upper left (high molecular weight, acidic) quadrant (#1) of the rat liver map, showing spot numbers. 
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M Figure 7. (a) Plot of computed isoelectric point versus gel ^-position fo- 
two sets of carbamylated standard proteins (rabbit muscle CPK M and 
human hemoglobin 0 chain, filled diamonds) and several other proteins 
(shaded squares), (b) The identities of the various proteins represented 
by the squares are indicated by the numbers in corresponding positions 
on (a); these refer to Table 4. 
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/i^urr 9. Montage showing effects in the 
region of MSN:4I3.The montage sho«» 
small window into one portion of the 2-0 
pattern, one row of windows for each expe- 
rimental group, and one panel for each gtJ 
in the experiment. The left-most pattern 
in each row is a group-specific copy of rt* 
master pattern followed by the pattern* 
for the five individual rats in the grouP- 
The highlighted protein spots (filled circ- 
les) are spot 413 (on the right of each P*»" 
el; identified as cytosolic HMG-CoAJ^J" 
thase) and two modified forms of it O**? 
and 933). From the lop, the rows (expe** 
mental groups) are: high cholesterol. con- 
trols, cholestyramine, lovastatin, and h>*»" 
statin plus cholestyramine. 
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Figure 10. Bargraph showing the quantita- 
tive effects of various treatments on the 
abundance of MSN:413 (cytosolic HMG- 
CoA synthase) in the gels of Fig. 9. 




Figure II. Bargraphs of a series of six co re- 
gulated spots including MSN:413. In the 
bargraphs, the abundances of the appro- 
priate spot (master spot number shown at 
the top of the panel) in each animal are 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovastatin, and 
lovastatin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal Iiver(one 2-D gel). Note the cor- 
related expression of the 6 spots, espe- 
cially in the two far right (most strongly in- 
duced) groups. 
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Figure 12. Data on a second coregulai^ 
group of spots. presented as in Fig. I l.Tr." 
fourth experimental group (lovasiau.- 
shows a modest induction, while the fifif. 
group (lovastatin plus cholestyramine < 
does not. 



367 




Figure 13. Data on spot MSN:367, presented as in Fig. 11. This pro*** 
shows unambiguously the anti-synergistic effect of lovastatin and c *°"~ 
tyramine (fifth group) as compared to lovastatin (fourth group)-* 1 " 5 
ponse contrasts strongly with the regulation pattern seen in Fig. 
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k 3 



11 
15 
17 
16 
19 
20 

• 21 
22 
23 
24 
25 
27 
26 
29 
30 
32 
33 
34 
35 
•36 
38 
39 
41 
42 
43 
44 
46 
47 
46 
49 
,50 
51 
52 

-4 

53 
54 
55 
56 
57 
58 
59 
GO 
61 
62 
65 
G6 
67 
68 
69 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 



311 
568 
612 
549 
845 
629 
906 
755 
649 
1204 
332 
787 
313 
807 
1184 
1263 
743 
768 
1216 
1145 
1037 
863 
712 
763 
304 
1165 
684 
1318 
1924 
1203 
1391 
309 
605 
621 
1113 
1620 
725 
2001 
722 
678 
1682 
1091 
1171 
1400 
1853 
1888 
735 
1263 
1252 
779 
1064 
656 
638 
1582 
1570 
1264 
1338 
1833 
1767 
925 
534 
1811 
1412 
1471 
1662 
1596 
1817 
516 
1589 
1706 
651 
1415 
1773 
1338 
1708 



434 

263 
426 
268 
520 
589 
414 



403 
448 
434 
424 
417 
516 
524 
446 
605 
112 
417 
445 
555 
412 
606 
694 
470 
569 
607 
569 
362 
566 
447 
454 
587 
535 
522 
499 
177 
500 
830 
533 
302 
580 
585 
624 
506 
567 
297 
312 
407 
692 
296 
569 
545 
583 
556 
621 
564 
363 
565 
738 
696 
363 
681 
347 
563 
479 
301 
1371 
698 
719 
329 
710 
545 
446 
696 



<-35.0 
-24.3 
-16.0 
-25.2 
-15.3 
-21 .6 
-14.0 
-17.5 
-20.9 
-8.7 
<-35.0 
-16.6 
<-35.0 
-16.1 
-9.0 
-8.0 
-17.8 
-17.2 
-8.6 
-9.5 
-11.3 
•14.9 
-18.7 
-17.3 
<-35.0 
-9.2 
-19.6 
-7.3 
-0.1 
-8.7 
-6.3 
<-35.0 
-22.5 
-21.8 
-10.0 
•0.9 
-18.3 
>0.0 
-18 4 
-19.8 
-2.5 
-10.3 
-9.2 
-6.2 
-0.6 
•0.4 
-18.1 
-8.0 
-8.1 
-16.8 
-10.8 
-20.6 
-21.2 
-3.6 
-3.8 
-8.0 
-7.0 
•0.8 
-1.5 
-13.6 
-26.1 
-1.0 
-6.0 
-5.0 
-2.7 
-3.4 
-0.9 
-27.0 
-3.5 
-2.2 
-20.8 
-6.0 
-1.4 
•7.0 
-2.2 



Y CPKdJ SOSMW 



63.800 
102,900 
64.600 
101.000 
55.200 
50.000 
66.300 
90.200 
67.900 
62,100 
63.600 
65.000 
66.000 
55.500 
54.900 
62,400 
49.000 
348.600 
66.000 
62.500 
52.400 
66,600 
48.900 
43,800 
59.800 
51,400 
48.800 
50.000 
74,600 
50.200 
62.300 
61.500 
50.100 
53.900 
55.000 
57,000 
170,800 
56.900 
37.300 
54.100 
89.000 
50,600 
50.300 
47.800 
56.200 
51.500 
90.500 
65,900 
67.300 
43.900 
90.800 
50.000 
53.100 
50,400 
52.300 
48,000 
51,600 
74.400 
51.700 
41,600 
43.600 
74,500 
44,500 
77.500 
51,800 
58,900 
89.100 
17,400 
43.600 
42,500 
81,700 
43.000 
53.200 
62.300 
43.700 



95 
96 

97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
106 
109 
110 
111 
113 
114 
115 
116 
117 
118 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
164 
166 
167 
168 
169 
170 
171 
172 
173 



1119 
1731 
1033 
1406 
578 
2004 
1106 
462 
665 
773 
312 
1769 
1585 
1692 
1462 
778 
1728 
1191 
1296 
682 
1146 
1548 
1050 
1530 
838 
1572 
23 
621 
1296 
872 
1000 
1229 
1422 
1776 
1930 
660 
666 
1271 
1161 
453 
1858 
1504 
1488 
1689 
311 
1366 
1429 
615 
2006 
2006 
1070 
1347 
541 
1645 
1269 
1507 
1722 
932 
1031 
1970 
1258 
1275 
1663 
1034 
1953 
1020 
1566 
1905 
1340 
1506 
1336 
1969 
800 
476 
919 



536 
756 



565 
1149 
538 
623 
455 
630 
1182 
1117 
509 
720 
807 
593 
516 
700 
680 
185 
907 
610 
849 
577 
828 
423 
712 
1433 
1474 
862 
921 
717 
311 
832 
499 
757 
537 
1019 
862 
1389 
1063 
823 
697 
707 
756 
1417 
915 
346 
1017 
566 
518 
1108 
578 
1481 
760 
236 
911 
448 
503 
294 
664 
163 
417 
820 
527 
771 
1462 
806 
565 
181 
563 
678 
541 
378 
958 
1314 



-9.9 
-2.0 
-11.4 
-6.1 
-23.8 
>0.0 
-10.1 
-28.5 
-20.2 
-17.0 
<-35.0 
-1.5 
-3.6 
-2.4 
-4.8 
-16.9 
-2.0 
-8.9 
•7.5 
-19.6 
-9.5 
-4.1 
-11.1 
-4.3 
-15.4 
-3.8 
<-35.0 
•21.9 
-7.5 
-14.7 
-12.0 
•6.4 
•5.8 
-1.4 
-0.1 
-20.4 
-20.2 
-7,9 
-9.3 
-29.7 
-0.6 
-4.6 
-4.8 
-2.4 
<-35.0 
-6.7 
-5.7 
-22.1 
>0.0 
>0.0 
-10.7 
-6.9 
•25.7 
-2.8 
-7.9 
-4.5 
-2.1 
-13.5 
-11.4 
>0.0 
-8.1 
-7.8 
-2.6 
-11.4 
>0.0 
-11.6 
•3.8 
-0.2 
-7.0 
-4.6 
-7.0 
>0.0 
■16.3 
•28.7 
■13.7 



53,800 
40.700 
51.600 
51,700 
25.000 
53.700 
47,900 
61,300 
37,300 
23.800 
26,100 
56,100 
42,500 
38,300 
49,700 
55,500 
43,500 
44,500 
160.800 
34,100 
48.700 
36,500 
50.800 
37,4a 
65,200 
42.90C 
15.30C 
13.90C 
36.00C 
33.50C 
42.60C 
86.10C 
37.30C 
57.00C 
40.70C 
53.80C 
29.70C 
36.00C 
16.80C 
28.10C 
37.70C 
43,700 
43.200 
40.700 
15.800 
33,800 
77,900 
29.800 
51 ,600 
55,300 
26.500 
50.800 
13.700 
40,500 
117,000 
33,900 
62,100 
56.600 
91,400 
44,400 
162.400 
65,900 
37,800 
54,600 
40,000 
13,700 
38,400 
51,700 
164,900 
50,400 
44,700 
53,500 
71,800 
32.100 
19.300 



174 
175 
177 
178 
179 
180 
181 
182 
184 
185 
186 
187 
166 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
210 
211 
213 
214 
215 
216 
217 
218 
219 
220 
221 
223 
225 
226 
227 
228 
229 
230 
232 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 
253 
254 
255 
256 
257 
258 



P^d^ll^^" 5 ?r C " l H : Cr daUbaSC ' Sh ° Wing Sp0t master numbc '' * eI P^on (x and,), 
raided molecular mass (from the standard curve of Fig. 8). 



1364 
625 
1562 
1321 
1069 
1666 
411 
804 
1660 
1997 
279 
773 
1538 
1560 
1818 
1469 
1380 
764 
1227 
667 
2006 
1711 
872 
292 
736 
786 
1224 
439 
1994 
1895 
240 
1700 
902 
1067 
1340 
1591 
1565 
1159 
931 
713 
1479 
965 
934 
1812 
621 
1566 
1065 
1577 
1458 
1440 
1692 
618 
920 
952 
1611 
1469 
501 
1820 
1357 
711 
1855 
1189 
551 
1348 
460 
1733 
1974 
606 
674 
753 
995 
1690 
994 
508 
1517 



183 
393 
553 
710 
615 
567 
295 
730 



1017 
1113 
296 
807 
674 
687 
555 
266 
632 
1165 
553 
681 
674 
424 
435 
253 
829 
589 
983 
571 
687 
1418 
499 
517 
684 
668 
495 
755 
393 
572 
177 
911 
927 
716 
1045 
411 
1463 
567 
690 
496 
649 
489 
1004 
1138 
1006 
541 
720 
448 
569 
658 
1182 
621 
474 
459 
604 
448 
451 
788 
392 
553 
646 
450 
679 
1006 
464 
620 



-6.7 
-15.7 
-3.6 
-7.2 
-10.4 
-0.5 
-32.1 
-16.2 
-0.6 
>0.0 
<-35.0 
-17.0 
-4.2 
-3.9 
-0.9 
-5.0 
•6.4 
-16.7 
•6.4 
-20.1 
>0.0 
-2.2 
. -14.7 
<-35.0 
-18.0 
-16.7 
•6.5 
-30.9 
>0.0 
-0.3 
<-35.0 
-2.3 
-14.1 
•10.4 
-7.0 
-3.5 
-3.6 
-9.3 
-13.5 
-18.7 
-4.9 
-12.8 
-13.5 
-1.0 
-15.8 
-3.6 
-10.8 
-3.7 
-5.2 
-5.5 
-Z4 
-22.0 
-13.7 
-13.1 
-3.2 
-4.8 
-27.7 
-0.9 
•6.8 
-18.7 
-0.6 
-8.9 
-25.1 
-6.9 
-29.3 
-1.9 
>0.0 
-16.1 
-14.6 
-17.6 
-12.1 
-2.4 
-12.1 
•27.4 
-44 



162.900 
69.300 
52.600 
43,000 
46.300 
51.600 
91.200 
42,000 
* 34.500 
29,800 
26.300 
90.800 
38.400 
44.900 
44.200 
52.400 
101.600 
47.300 
23,700 
52.600 
44,500 
44,900 
65.000 
63.700 
107.800 
37.400 
50.000 
31.100 
51.300 
44.200 
15.800 
57,000 
55,400 
44,400 
45.200 
57,300 
40,700 
69,300 
51,200 
170,500 
33.900 
33.300 
42,700 
28.600 
66,800 
13,600 
51.600 
34,800 
57.300 
36,500 
57.900 
30.300 
25,400 
30,200 
53,500 
42.500 
62.100 
51.400 
45,800 
23,600 
48.000 
59,300 
61.000 
49.100 
62,100 
61.800 
39.200 
69.500 
52,500 
36.500 
61.900 
44.600 
30.200 
60,400 
37.800 
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250 
260 
261 
262 
263 
265 
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267 
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270 
271 
272 
274 
275 
276 
277 
278 
279 
281 
282 
283 
284 

286 
288 
289 
290 
291 
292 
293 
294 
295 
296 
297 
299 
300 
301 
302 
303 
304 
305 
306 
307 
306 
309 
310 
311 
312 
313 
314 
315 
318 
320 
321 
322 
323 

324 

325 

326 

327 

328 

330 
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1189 
1190 
1191 
1192 
1193 
1194 
1195 
1196 
1197 
1196 
1199 
1200 
1201 
1202 
1203 
1204 
1205 
1208 
1209 
1210 
1211 
1212 
1214 
1215 
1216 
1217 
1218 
1219 
1220 
1221 
1222 
1223 
1224 
1225 
1226 
1227 
1 
1 

1230 
1231 
1232 
1233 
1234 
1235 
1236 
1237 
1238 
1239 
1240 
1241 
1242 
1243 
1244 
1245 



921 
15&4 
637 
623 
665 
564 
552 
538 
545 
1099 
1354 
1366 
1606 
1485 
1459 
1431 
1407 
1383 
1454 
1422 
1394 
1171 
1457 



1158 



265 
403 
344 

505 
572 
639 
637 
614 
637 
1095 
1719 
791 
964 
313 
306 
320 
326 
394 
402 
386 
641 
660 
914 
873 
970 
1021 
1392 
1354 
1362 
673 
614 
603 
696 
707 
475 
466 
759 
1324 
1583 
1865 
1812 
1411 
1392 
794 
769 
740 
743 
713 
662 
663 
565 



400 

397 

397 

528 

529 

524 

514 

522 

586 

539 

702 

224 

224 

223 



224 
182 
183 
182 
214 



1114 
893 
1292 
1275 
1311 
1293 
1502 
1402 
1407 
1431 
1394 
1545 
666 
1021 
195 
194 
197 
197 
294 
294 
294 
329 
329 
266 
245 
372 
298 
205 
203 
205 
540 
542 
539 
623 

628 

447 

1282 
1461 
1170 
1005 

809 

817 

703 

682 

410 

407 

406 

511 

510 

509 

504 

582 



-13.7 
-3.5 
-21 .3 
-21.8 
-20.2 
-24.4 
-25.0 
-25.9 
-25.5 
-10.2 
-7.5 
-6.6 
-3.3 
-4.8 
.-5.2 
-5.7 
-6.1 
-6.4 
-5.3 
-5.8 
-6.3 
-9.2 
-5.2 
-19.5 
<-35.0 
-32.6 
<-35.0 
-27.6 
-24.1 
-21.2 
-21.3 
-22.1 
-21.3 
-10.3 
-2.1 
-16.5 
-12.9 
<-35.0 
<-35.0 
<-35.0 
<-35.0 
-33.2 
-32.7 
-33.7 
-21.2 
-20.4 
-13.8 
-14.7 
-12.7 
-11.6 
-6.3 
-6.8 
-6.7 
-19.9 
-22.1 
-22.6 
-19.2 
-18.9 
-28.7 
-29.0 
-17.4 
-7.2 
-3.6 
-0.6 
-1.0 
•6.0 
-6.3 
-16.4 
-17.1 
-17.9 
-17.8 
-18.7 
-19.6 
-20.3 
-24.4 



24,700 
35.900 
68.400 
66,800 
68.700 
54.500 
54,500 
54.600 
55.700 
55,000 
50,200 
53,700 
43.400 
124,900 
124,900 
125.100 
125.200 
124,700 
164,400 
162,600 
164,300 
131,800 
94.200 
26,200 
34,700 
20,000 
20,600 
19,400 
20,000 
13,000 
16,300 
16,200 
15,400 
16,600 
11.600 
45.200 
29.700 
148,700 
149.800 
147,400 
146,600 
91.400 
91.200 
91,400 
61,600 
81,600 
101,800 
112,000 
72,900 
90,100 
139,500 
141,800 
139.500 
53,600 
53,400 
53,600 
47,800 
47,500 
62,300 
20,400 
14,400 
24,200 
30,300 
38,200 
37.900 
43,400 
44,500 
66,900 
67.300 
67,500 
55,900 
56.000 
56.100 
56.500 
50.500 



1246 

1247 

1249 

1250 

1251 

1252 

1253 

1254 

1255 

1257 

1256 

1259 

1260 

1261 

1262 

1263 

1264 

1265 

1266 

1267 

1266 

1269 

1270 

1271 

1272 

1273 

1274 

1277 

1278 

1279 

1280 

1281 

1282 

1283 

1284 

1285 

1286 

1287 

1288 

1289 

1290 

1291 

1292 

1293 

1294 

1295 



547 

530 
516 
973 
607 
665 
899 
1311 
1300 
1938 
1806 
1727 
1629 
1555 
1466 
1413 
1340 
1263 
1182 
1110 
1055 
999 
959 
905 
857 
810 
774 
737 
702 
671 
645 
617 
595 
573 
552 
536 
515 
496 
467 
447 
427 
412 
397 
381 
365 
348 



577 

576 

572 

536 

532 

529 

766 

746 

761 

712 

718 

715 

713 

717 

717 

722 

717 

717 

720 

717 

717 

717 

715 

712 

714 

705 

711 

706 

711 

710 

710 

707 

704 

700 

CAT 

694 
667 
683 
669 
667 
655 
655 
652 
654 
653 
653 



<- 



-25.3 
-26.3 
-27.0 
-12.7 
-224 
-20.2 
-14.1 
-7.4 
-7.5 
0.0 
-1.0 
-2.0 
-3.0 
-4.0 
-5.0 
-6.0 
-7.0 
•6.0 
-9.0 
-10.0 
-11.0 
-12.0 
-13.0 
-14.0 
-15.0 
-16.0 
•17.0 
-18.0 
-19.0 
-20.0 
-21 .0 
-22.0 
-23.0 
-24.0 
-25.0 
-26.0 
-27.0 
•28.0 
-29.0 
-30.9 
-31.0 
-32.0 
-33.0 
-34.0 
-35.0 
35.0 



50,800 

50,900 

51.200 

53,900 

54.200 

54.400 

40.200 

41.200 

40,400 

42.900 

42.600 

42,700 

42,800 

42,600 

42.600 

42,400 

42.600 

42,600 

42,500 

42.600 

42.600 

42.600 

42.700 

42.900 

42.800 

43,300 

42.900 

43,100 

42,900 

43,000 

43.000 

43,100 

43,300 

43,500 

43,700 

43,800 

44.200 

44.400 

45.200 

45,300 

45.900 

45,900 

46,100 

46,000 

46,100 

46.100 
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c 3. Computed pf s of rwo sets of caxbamylaied protein standards: FUbbit muscle CPK and human 
hemoglobin (Hb) 



Protein Name 



PiR tASP *GLU *H1S #LYS #ARG NH2- Calc Real 
Name 3.9 4.1 6.0 10.8 12.5 7.0 ol CPK 



Rabbit muscle CPK KIRBCM 



28 


27 


17 


34 


18 


1 6.84 


0.0 


28 


27 


17 


33 


18 


1 6.67 


-1 


28 


27 


17 


32 


18 


1 6.54 


-2 


28 


27 


17 


31 


18 


1 6.42 


-3 


28 


27 


17 


30 


18 


1 6.31 


-4 


28 


27 


17 


29 


18 


1 6.21 


•5 


28 


27 


17 


28 


18 


1 6.12 


-6 


28 


27 


17 


27 


18 


1 6.03 


-7 


28 


27 


17 


26 


18 


1 5.94 


-8 


28 


27 


17 


25 


18 


1 5.85 


-9 


28 


27 


17 


24 


•18 


1 5.76 


-10 


28 


27 


17 


23 


18 


1 5.67 


-11 


28 


27 


17 


22 


18 


1 5.58 


-12 


AA 

28 


27 


17 


21 


18 


1 5.48 


•13 


28 


27 


17 


20 


18 


1 5.39 


-14 


28 


27 


17 


19 


18 


1 5.29 


-15 


28 


27 


17 


18 


18 


1 5.20 


-16 


28 


27 


17 


17 


18 


1 5.12 


-17 


28 


27 


17 


16 


18 


1 5.04 


-18 


28 


Z7 


17 


15 


18 


1 4.96 


-19 


28 


27 


17 


14 


18 


1 4.89 


-20 


28 


27 


17 


13 


18 


1 4.83 


-21 


28 


27 


17 


12 


18 


1 4 77 


-25 


28 


27 


17 


11 


18 


1 4.71 


-23 


28 


27 


17 


10 


18 


1 4.66 


-24 


28 


27 


17 


9 


18 


1 4.61 


-25 


28 


27 


17 


8 


18 


1 4.56 


-26 


28 


27 


17 


7 


18 


1 4.52 


-27 


28 


27 


17 


6 


18 


1 4.48 


-28 


28 


27 


17 


5 


18 


1 4.44 


-29 


28 


27 


17 


4 


18 


1 4.40 


-30 


28 


27 


17 


3 


18 


1 4.36 


-31 


28 


27 


17 


2 


18 


1 4.32 


-32 


28 


27 


17 


1 


18 


1 4.29 


-33 


28 


27 


17 


0 


18 


1 4.25 


-34 


28 


27 


17 


0 


18 


0 4.22 


-35 


7 


8 


9 


11 


3 


1 7.18 




7 


8 


9 


10 


3 


1 6.79 




7 


8 


9 


9 


3 


1 6.53 


-1.8 


7 


8 


9 


8 


3 


1 6.32 


-3.2 


7 


8 


9 


7 


3 


1 6.13 


-5.3 


7 


8 


9 


6 


3 


1 5.96 


-7,2 


7 


8 


9 


5 


3 


1 5.78 


-10.0 


7 


8 


9 


4 


3 


1 5.59 


-12.3 


7 


8 


9 


3 


3 


1 5.37 


-15.5 


7 


8 


9 


2 


3 


1 5.14 


-18.0 


7 


8 


9 


1 


3 


1 4.91 


-21.0 


7 


8 


9 


0 


3 


1 4.71 


-25.5 


7 


8 


9 


0 


3 


0 4.54 


-27.2 



H6*beta, human 



HBHU 
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Table 4. Computed p/~ s of some known proteins related to measured CPK pfs 



0 
1 
2 
3 
4 
5 
6 
7 
6 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 



Protein Name 

Creatine phospho kinase (CPK), rabbit muscle 
Fatty acid-binding protein, rat hepatic 
b2-microglobuiin, human 
Carbamoyl-phosphate synthase, rat 
Prealbumin ( serum albumin precursor), rat 
Serum albumin, rai 

Superoxid dismutase (Cu-Zn, SOD), rat 

Phospholipase C, phophoinositide-specific (?), rat 

Albumin, human 

Apo A-l lipoprotein, rat 

proApo A-l lipoprotein, human 

NADPH cytochrome P-450 reductase, rat 

Retino! binding protein, human 

Actin beta, rat 

Actin gamma, rat 

Apo A-l lipoprotein, human 

Apo A-IV lipoprotein, human 

Tubulin alpha, rat 

FlATPase beta, bovine 

Tubulin beta, pig 

Protein disulphide isomerase (PD1), rat hepatic 

Cytochrome b5, rat 

Apo C-ll lipoprotein, human 

Amino acid pi assumed in calutation: 



PIR 


#ASP #GLU #HIS #LYS #ARG 


Name 


3.9 


4.1 


6.0 


10.8 




KIRBCM 


28 


27 


17 


34 


18 


FZRTL 


5 


13 


2 


16 


2 


MGHUB2 


7 


6 


4 


8 


5 


SYRTCA 


72 


96 


28 


95 


56 


ABRTS 


32 


57 


15 


53 


27 


ABRTS 


32 


57 


15 


53 


24 


A26810 


8 


11 


10 


9 


4 


A28807 


34 


42 


9 


49 


21 


ABHUS 


36 


61 


16 


60 


24 


A24700 


18 


24 


6 


23 


12 


LPHUA1 


16 


30 


6 


21 


17 


RDRT04 


41 


60 


21 


36 


36 


VAHU 


18 


10 
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An updated two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effect 
studies 

We have improved upon the reference two-dimensional (2-D) electrophoretic 
map of rat liver proteins originally published in 1991 (N. L. Anderson et aL. 
Electrophoresis 1991, 12, 907-930). A total of 53 proteins (102 spots) are now 
identified, many by microsequencing. In most cases, spots cut from wet, Coo- 
massie Blue stained *2-D gels were submitted to internal tryptic digestion [2], 
and individual peptides, separated by high-performance liquid chromatography 
(HPLC). were sequenced using a Perkin-Elmer 477A sequenator. Additional 
spots were identified using specific antibodies. 



Figure 1 shows the current annotated 2-D map of F344 
rat liver, analyzed using the Iso-DALT system (20 X 25 
cm gels) and BDH 4-8 carrier ampholytes. Both the 
map itself and the master spot number system remain 
the same as shown in the original publication. Table 1 
lists the important features of each identification shown, 
including the gel position, pA and M, for the most 
abundant or most basic form of each protein. Using this 
extended base of identified spots, a series of four 
improved calibration functions has been derived for the 
p/and SDS-A/, axes (the first two of which are shown in 
Fig. 2A and B). Both forward and reverse functions are 
derived, so that one can compute the physical properties 
of a spot with a given gel location, or inversely compute 
the gel position expected for a protein having given 
physical properties: 



RAT LIVER 



•Rat liver 



«=/» 



Mr-R AT LIVER 



— -/pi— 1 



SEOl'EVCE-DERI 



ved) 



RATLIVER X VF J $EQL'ENCE-DERrvEDJ 



^rO EL-DERIVED — ./raT LIVER Y-Mr (^RaTLIV'Er) 



p/. 



GEL-DERIVED "~ ./RATLIVER X 



[-pi (^1 



RAT LIVE 1 



(1) 

(2) 
(3) 
(4) 



A spreadsheet program (in Microsoft Excel) was devel- 
oped to facilitate flexible computation of pfs from 
amino acid sequence data, and the results were entered 
into a relational database (Microsoft Access). A table of 
spot positions and sequence-derived pPs and M r y s was 
fitted with a large series of analytic equations using 
Tablecurve (Jandel Scientific), and the four conversion 
Eqs. (l}-(4)i relating computed p/ and gel X coordinate, 
or computed molecular weight and gel Y coordinate, 
were selected, based on criteria of simplicity, goodness 
of fit and favorable asymptotic behavior. Table 2 lists the 
equations and coefficients. Application of Eqs. (3) and 
(4) to a spot's A" and Y coordinates, given in [l], produce 
improved M % estimates, and allow computation of p/ 
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tion, 9620 Medical Center Drive, Rockville, MD 20850-3338 USA (Tel: 
+301-424-5989; Fax: +301-762-4892; email: Ieigh@lsbc.com) 

Keywords: Two-dimensional polyacrylamide gel electrophoresis / Liver 
/ Map / Identification / Calibration 



directly in pH units, instead of in terms of positions rela- 
tive to creatine phosphokinase (CPK) charge standards. 
The inverse Eqs. (1) and (2) were used to compute the 
gel positions of a series of p/ and M t tick marks. These 
tick marks were plotted with SigmaPlot (Jandel), 
together with fiducial marks locating several prominent 
spots, and the resulting graphic was aligned over the syn- 
thetic gel image (computed by Kepler from the master 
gel pattern) using Freelance (Lotus Development). Maps 
were printed as Postscript output from Freelance, either 
in black and white (as shown here) or in color, where 
label color indicates subcellular location (available from 
the first author upon request). We have also used the rat 
liver 2-D pattern as presented here to calibrate the pat- 
terns of other samples. Using mixtures of rat liver and 
mouse liver samples, for example, we made composite 
2-D patterns that allow use of the rat pattern to standar- 
dize both axes of the mouse pattern. This was accompli- 
shed by deriving transformations relating the rat and 
mouse X y and separately the rat and mouse Y, axes 
(Table 2, lower half; Fig. 2C and D) based on a series of 
spots that coelectrophorese in these closely related spe- 
cies. These functions were then applied to derive equa- 
tions relating the mouse liver Zand Xto pi and SDS-M t 
(Eqs. 5 and 6 below). The resulting standardized 2-D pat- 
tern for B6C3F1 mouse liver is shown in Fig. 3. 
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P^MOUSEUVER = AaT LIVER X-pI (/MOUSE LIVER X-RAT LIVER X 

■ (^MOUSE LIVER)) W 

A slightly more complex approach can be used to stand- 
ardize samples that have few or no spots co-electropho- 
resing with rat liver proteins. In this case, a 2-D gel is 
prepared with a mixture of the two samples, and four 
functions (forward and backward, each for X and Y) are 
derived relating each sample's own master pattern to the 
composite. The required functions are then applied in a 
nested fashion to yield the desired result (using rat 
plasma as an example): 

^,*ATPLASMA = /RATLIVER Y-M f (/raT PLASMA* LIVER Y-RATUVER Y 

(/raT PLASMA Y-RAT PLASMA + UVER Y ( MUT PLASM a))) 

(7) 
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Table 1. Proteins identified in the 2-D pattern of F344 rat liver 



MSN 



126 



1} 



Protein ID b * 



Protein name 



Identification comments 



Gel X* 1 Experimental Gel y e) Experimental 



137, 159, 288, 


DIDH.RAT 


258 




173 


MUP.RAT 


38 


ACTB.HUMAN 


68 


ACTG .HUMAN 


693 


AFAR.RAT 


28, 21, 33 


ALBU.RAT 


43 


DHAM.RAT 


96 


ARGI.RAT 


117 


SUAR.RAT 


1)63, 1161, 


GR78.RAT 


1162, 20 




185 


CAH3.RAT 


123 


CALM.HUMAN 


3, 201, 48, 39, CRTCRAT 


22, 24 





HADO-HUMAN" 3-HA-3,4-DO: 3-hydroxy. Internal sequence 

anthraniIate-3,4-dioxy- 
genase 

3HDD: 3-hydroxysteroid Ab (T.M. Penning) and pure protein 

dihydrodiol reductase 
a;u globulin Presence in liver microsome lumen, 

abundance in kidney, p/, M t 
P Analogy with other mammalian patterns 

(e.g. human) through coelectrophoresis 
Ac,,n y Analogy with other mammalian patterns 

(e.g. human) through coelectrophoresis 
Internal sequence 



Aflatoxin Bl aldehyde 

reductase 
Albumin 

Aldehyde dehydrogenase 
Arginase 

Arylsulfotransferase 
BIP (CRP-78) 

CA-lII 



Calreticulin 



Coelectrophoresis with principal plasma 
protein 

A -Terminal sequence and AAA 
Internal sequence 
Internal sequence 
Ab (F. Wiumann) 

Uncertain; by comparison with mouse 
Analogy with human cellular patterns 

through coelectrophoresis 
Ab (Lance Pohl) 



871.95 5.36 



921.35 30 207 



1857.52 


6.51 


822.52 


34 406 


919.16 


5.43 


1313.81 


19 549 


763.40 


5.19 


693.64 


41 586 


779.42 


5J1 


692.26 


41 677 


1993.32 


6.72 


818.60 


34 593 


1262.81 


5.86 


445.64 


66 354 


1317.72 


5.91 


589.03 


49 602 


1730.72 


6.34 


756.02 


37 819 


1547.96 


6.14 


849.08 


33 186 


665.33 


5.01 


397.39 


74 564 


1996.60 


6.72 


1017.02 


26 887 


23.05 


4.03 


1433.25 


17 419 


310.59 


4.34 


433.80 


68 206 
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Tiblc 1. continued 

MSN" Protein IDb) Protein nam7 



Identification comments 



1184,1186, CPSM RAT 
114, 174, 118 
5, 167, 157 



Gel *° Experimental Gel J*' Experimental 



54, 61 
136 

87 

41 
29 

5, 11 

60 

27 

17 

196 

79 

62, 78 
125 

307 



CATA.RAT 
COX2JUT 



Cart) am yl phosphate 
synthase 

CataJase 
COX-II 



CYB5.RAT Cytochrome B5 



CK-RAT 0 
CK-RAT* 
ENPL-RAH 
ENOA.RAT 
ER60.RAT 
ATPB_RAT 
ATP7.RAT 
F16P.RAT 

DHE3.RAT 
HAST-RAT*' 

HOl.RAT 

HMCS.RAT 



413, 1250, 
933 

133, 144, 235 HMCS.RAT 

8, 23, 1307 HS7CJUT 

15,25,110 P60JUT 



Cytokeraiin 
Cyio keratin 
Endoplasmic 
Enolase A 
ER-60 

Fl ATPase 0 
Fl ATPase 6 

Fructose* 1 ,6- bis- phosphatase 

Glutamate dehydrogenase 
HAST-1: N -hydroxy aryj- 
amine sulfo transferase 
Heme oxygenase 1 

HMG CoA synthase, 

cytosolic 
HMG CoA synthase, 

mitochondria] (frag) 
HSC-70 

HSP-60 



971 HS70-RAr } 
1216, 1215, 90 HS90-RAr» 
256 



415, 734 

80 

227 

134 

18, 35, 226 

175, 251 
1168, 1170, 
1171 
47, 93 

236 
320 

152 

1179, 1180, 
1181, 1182, 
1183 
55, 103 

135 

172 

277, 56 
50, 1225 
1224 



HSP-70 
HSP-90 

INGI-HUMAN Imerferon-y induced 
protein 

LAMB-RAT* Lamin B 



LAMR-RAT*' 
FABL.RAT 



"Lamina receptor* 
L-FABP (liver fatty acid 
binding protein) 
MDHC_MOUS Malate dehydrogenase 
E 

GR75-RAT*' MitconJ; grp75 



2-D of pure protein; comfirmed by 
A'-terminal sequence and AAA 

Internal sequence 
Ab (J. W. Taanman), confirmed by 

internal sequence 
2-D of pure protein; Ab; confirmed 

by AAA 
Location in cytoskeletaJ fraction 
Location in cytoskeletaJ fraction 
Ab (F. Witzmann) 
Internal sequence and AAA 
A^Terminal sequence (R. M. Van Frank) 
^-Terminal sequence and AAA 
Internal sequence 

Uncertain; by comparison with ID in 
Garrison and Wager (JBC 257:13135-13143) 
A'-Terminal sequence and internal sequence 
Internal sequence 

Uncertain; available data from internal 

sequence 
Ab (J. Gennershausen) 

Ab (J. Gennershausen), ^-terminal 

sequence (Steiner/Lottspeich) 
Positional homology (with human, etc.) 

through coeiectropboresis 
Ab (F. Witzman); confirmed by /V-terminal 

sequence and AAA 
Ab (F. Witzman) 
Ab (F. Witzman) 
Internal sequence 

Positional homology with human through 

coeiectropboresis, nuclear location 
Internal sequence 
Ab (N. M. Bass) 

Internal sequence 



1453.56 6.05 

2000.81 6.73 
452.57 4.61 

515.68 4.73 

1165.12 5.75 

743.11 5.15 

567.73 4.83 

1399.78 6.00 

1184.20 5.77 

629.06 4.95 

1227.24 5.82 

924.54 5.44 

1887.39 655 
1297.94 5.89 

1219.39 5.81 

1033.48 5.59 

666.40 5.02 

811.87 5.27 

845.09 5.32 

976.11 5.51 
659.86 5.00 
993.85 554 

737.10 5.14 

534.02 4.77 
1586.09 6.18 

1270.85 5.86 



NCPRJUT 
PDI.RAT 

ALBUJUT 



NADPH P450 reductase 
PDI: Protein disulfide 

isomerase 
Pro-Albumin 



Positional homology with human through 905.67 5.41 

coeiectropboresis 

2-D of pure protein 824.69 5.29 

^-Terminal sequence (R. M. van Frank), Ab 56450 4.83 



APAl.RAT Pro-APO A-I lipoprotein 
IPKLBOVIN Protein kinase C inhibitor 1 

PNPH.MOUSE Purine nucleoside 

pbosphorylase 
PYVC-RAT* Pyruvate carboxylase 



Microsomal lumen location, p/, M f relative 1391.03 5.99 

to albumin 
Coeiectropboresis with plasma protein 
Internal sequence; homology with bovine 

protein 
Internal sequence 



920.41 5.43 
1480.01 6.08 



SM30.RAT 

SODC.RAT 

TPM-RAT*' 

TBALRAT 

TBBLRAT 



SMP-30: Senescence 
marker protein-30 
Superoxide dismutase 

Tm: tropomyosin 

Tubulin a 

Tubulin 3 



VIME_RAT Vimentin 



Tentative; 2-D of pure protein (J. G. 
Henslee, JBQ 1979); reported in Biochim. 
Biophys. Acta 1022, 115—125- 
Internai sequence 

AAA; comfirmed by internal sequence 

(R. M. Van Frank) 
Location in cytoskeleton, 2-D position 

relative to human, Ab 
Positional homology, with human through 
coeiectropboresis, cytoskeletaJ location 
Positional homology with human through 
coeiectropboresis, cytoskeletaJ location 
Posi tonal homology with human through 
coeiectropboresis, cytoskeletaJ location 



1507.19 6.10 
1485.10 6.08 



181.64 160 640 



499.64 58 968 
1062.67 25 504 

137055 18 493 



569.09 
605.23 
263.37 
623.54 
523.51 
588.83 
1184.65 
737.77 



51 448 

48 187 
112 194 
46 674 
56.169 

49 620 
22 310 
38 858 



566.92 51 655 

86155 32 638 

915.71 30 423 

538.13 54 571 

1019.42 26 811 
425.76 69 521 
520.03 56 561 

437.14 67 674 
329 90 107 

1006.04 27 237 

425.19 69 615 

697.62 41 327 

1483.43 16 622 

861.96 32 620 

413.67 71 589 

393.21 75 366 

528.47 55 618 

446.68 66 195 

1137.51 23 467 

1458.81 17 007 

911.16 30 599 

22352 131 589 



721.71 


5.11 


830.10 


34 051 


1161.24 


5.74 


1388.68 


18 173 


476.24 


4.66 


957.86 


28 865 


688.22 


5.06 


537.67 


54 620 


621.29 


4.93 


535.48 


54 855 


673.00 


5.03 


53950 


54 426 
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Table 1. continued 



MSN 



Protein IDb) Protein tame 



113 
104 




Unknown i 

databases 

BBPL.RAT 23 fcDa morphine-binding Internal sequence 
protein 



773J1 5.20 



680.42 42 469 
1182.41 22 363 



a) Master spot number (MSN) from [1] 

b) SwissPROT identifier 

S yUiT.? £ ^,,1?' " m0S ' t bUndant aSSigned S " 01 on ,he ™< master gel pauern 

Abbreviations: AAA. amino acid analysis; Ab, antibody 

Table 2. Equations and coefficients 
Function Equation (0 




Rat gel Y » f( computed .V.t v 
Rat gel X = f( computed p/j > 
Computed M r « f[rat gel y) j 
Computed p/ = flrai gel X) y 



Mouse gel Y = flrai gel Y) 



c - />exp(-x/c) 
at Axt ix/Injr » d/x + */x u 
a -t bxc 

a + bx + ex 5 + dx 3 Inx + ec 3 



>• «= a + Ajt + or 1 - 5 + rfx 0 -* In* + 
ex/lnx 

Mouse gel X = flnu gel Jt) = * + fcr lax + ex" + dx 3 
Rat gel X a flmouse g C l y) .y^a + fc^lnx+cx^+dx 3 
Rat gel X B ffmouse get *) >' = * + Ax + cx 3 Inx + </x" + ex 3 



0.988181021 
0.99247216 
0.9960177 
0.99176499 



0.99951069 
0.99926349 
0.99950032 
0.9992832 



178.74803 1967.7892 32363.958 

-8685665.5 -904497.94 3856926.1 

-8464.5809 19095881 -0.9086255 

4.044686 -O.001 14238 0.0000323 



18276844 -27154534 
-O.0OO0O455 0.00000000176 



11861.44 678.91666 

58.935923 0.00091353 

69.740526 0.00050772 

-198.07189 2.0899063 



-0.78964914 
-0.000213688 
-0.000130392 
-0.000671191 



15673639 
0.00000159 
0.00000116 
0.000145189 



-6953.9592 



-0.000000986 



y^+bx+cx/lnx+d/x+e/x^l .5) 



2500 



0 



200O- - 



1500- 



1000- 



500 




5 6 
computed pi 



B 



y=a+bexp(-x/c) 




50000 100000 
computed MW 



150000 



y=a+bx+cx A 2lnx+dx A (2.5)+ex A 3 



2500 



2000- 



I 

w 
2 




1000 2000 
B6C3F1MST2_X 



3000 



D 



y=a+bx A 2lnx+cx A {2.5)+dx A 3 



5 

eo 
u. 




500 



1000 1500 
B6C3F1MST2_Y 



2000 



2500 



m^^^^f,^^^^^^^^ V? °° iden,ifi ; d pro,eiDS (souare symbol$) - (A) p/com " uted from 

•pott in F344 rat Uve* (C) K^toSlSS^S^ST r <B) C °? PU,e<l fr ° m " qUence daU w,ut gel r P osition for identir "<« 
gel Y position for ,po« B^F~ tlrZus StaS^' r^ 0 ! 1 ? ^ coe, «« ro P"»"«ng spots; (D) 

were also computed a«ble 2). ' for c °e>e«">Phoresmg spots. In each case, inverse equations 
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30- 
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MitconJ 0»c 
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200 
150 



100 
90 
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Pi 
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P^RATPLASMA AaTLIVER X-pl (/raT PLASMA* LIVER X— RAT LIVER X 

Vratplasma x-rat plasma -liver x C^rat plasma))) 

(8) 

This unified approach, in which one well-populated 2-D 
pattern is used to standardize a family of other patterns, 
has the additional advantage that the resulting p/ and M 
scales are directly compatible. Hence one can compare 
the relative pfs of mouse and rat versions of a se- 
quenced protein in a consistent pi measurement system, 
and select likely inter-species analogs based on posi- 
tional relationships on common scales. Adoption of 
immobilized pH gradient (IPG) technology [4-7] will 
result in substantial improvements in pi positional 
reproducibility for standard 2-D maps such as those pre- 
sented here; however, we believe that our approach will 
continue to be useful in establishing the empirical pH 
gradient actually achieved by such gels under given 
experimental conditions (temperature, urea concentra- 
tion, etc.), in relating patterns run on different IPG 
ranges and using different lots of IPG gels (between 
which some variation will persist). Development of 
rodent organ maps is a continuing efTort in our laborato- 
ries [8-10], and results in regular additions of identified 
proteins. Those who wish to receive current rodent liver 
maps, with color annotations, should send a stamped 
self-addressed envelope to the first author. 



We would like to thank the individuals who provided anti- 
bodies mentioned in Table 7, and R. M. van Frank for un- 
published sequenced data. 
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identify all cDNA species, and the approach does not easily allow a systematic 
screening. Analysis of gene expression by the study of proteins present in a cell or 
tissue presents a favorable alternative. This can be achieved by use of two-dimensional 
(2-D) gel electrophoresis, quantitative computer image analysis, and protein identifi- 
cation techniques to create 'reference maps' of all detectable proteins. Such reference 
maps establish patterns of normal and abnormal gene expression in the organism, and 
allow the examination of some post-translational protein modifications which are 
functionally important for many proteins. It is possible to screen proteins systemati- 
cally from reference maps to establish their identities. 

To define protein-based gene expression analysis, the concept of the 'proteome* 
was recently proposed (Wilkins et al. 1995; Wasingere/ al.. 1995). A proteome is the 
entire PROTein complement expressed by a genOME. or by a cell or tissue type. The 
concept of the proteome has some differences from that of the genome, as while there 
is only one definitive genome of an organism, the proteome is an entity which can 
change under different conditions, and can be dissimilar in different tissues of a single 
organism. A proteome nevertheless remains a direct product of a genome. Interest- 
ingly, the number of proteins in a proteome can exceed the number of genes present, 
as protein products expressed by alternative gene splicing or with different post- 
translational modifications are observed as separate molecules on a 2-D gel. As an 
extrapolation of the concept of the 'genome project*, a 'proteome project* is research 
which seeks to identify and characterise the proteins present in a cell or tissue and 
define their patterns of expression. 

Proteome projects present challenges of a similar magnitude to that of genome 
projects. Technically, the 2-D gel electrophoresis must be reproducible and of high 
resolution, allowing the separation and detection of the thousands of proteins in a cell. 
Low copy number proteins should be detectable. There should be computer gel image 
analysis systems that can qualitatively and quantitatively catalog the electrophoretically 
separated proteins, to form reference maps. A range of rapid and reliable techniques 
must be available for the identification and characterisation of proteins. As a conse- 
quence of a proteome project, protein databases must be assembled that contain 
reference information about proteins; Such databases must be linked to genomic 
databases and protein reference maps. Databases should be widely accessible and easy 
to use. 

Recently, there have been many changes in the techniques and resources available 
for the analysis of proteomes. It is the aim of this chapter to discuss the status of the 
areas outlined above, and to review briefly the progress of some current proteome 
projects. 

Two-dimensional electrophoresis of proteomes 

Two dimensional (2-D) gel electrophoresis involves the separation of proteins by their 
isoelectric point in the first dimension, then separation according to molecular weight 
by sodium dodecyl sulfate electrophoresis in the second dimension. Since first 
described (Klose. 1975; OTarrell, 1975; Scheele, 1975), it has become the method of 
choice for the separation of complex mixtures of proteins, albeit with many modifica- 
tions to the original techniques. 2-D electrophoresis forms the basis of proteome 
projects through separating proteins by their size and charge (Hochstrasser et al, 
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Figure 1. Two-dimensional gel electrophoresis map of a human hcpaioblasioma-derivcd lvII linn 
illustrating the very h.gh resolution of the technique. The first dimensional sen ™i»„ 1 i r r- ' 

1992:Celis« fl /., 1993: Carrels and Franza. 1989: VanBoeelen et aL 199?) Current 
protocols can resolve two to three thousand proteins from a complex sample on a 
single gel (Figure J). y 



2-D GEL RESOLUTION AND REPRODUCIBILITY 

A primary challenge of separating complex mixtures of proteins bv ?-D gel electro 
phoresis has been to achieve high resolution and reproducibility! High resolution 
ensures that a maximum of protein species are separated, and high reproducibility is 
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vital to allow comparison of gels from day to day and bepvcen research sites. These 
factors can be difficult to achieve. 

Carrier ampholytes are a common means of isoelectric focusing for the first 
dimension of 2-D electrophoresis. Gels are usually focused to equilibrium to separate 
proteins in the pi range 4 to 8. and run in a non-equilibrium mode (NEPHGE) to 
separate proteins of higher pi (7 to 1 1.5 ) (OTarrell. 1975: OTarrell. Goodman and 
O'Farrell. 1977). Unfortunately, the use of carrier ampholytes in the isoelectric 
focusing procedure is susceptible to 'cathode drift', whereby pH gradients established 
by p -efocusing of ampholytes slowly change with time ( Righetti and Drysdale, 1 973 ). 
Carrier ampholyte pH gradients are also distorted by high sail concentration of 
samples (Bjellqvist 1982), and by high protein load (OTarrell. 1 975). A further 
limitation is that iso electric focusing gels, which are cast and subject to electrophore- 
sis in narrow glass tubes, need to be extruded by mechanical means before application 
to the second dimension - a procedure that potentially distorts the gel. Nevertheless, 
many of the above shortcomings can be avoided by loading small amounts of N C or "S 
radiolabeled samples (Garrels. 1989; Neidhardt et aL. 1989; Vandekerkhove et aL. 
1990). High sensitivity detection is then achieved through use of fluorography or 
phosphorimaging plates (Bonner and Laskey. 1974; Johnston, Pickett and Barker, 
1990; Patterson and Latter. 1993). However, this approach is only practicable for 
orsanisms or tissues that can be radiolabeled. 

An alternative technique, which is becoming the method of choice for the first 
dimension separation of proteins, involves isoelectric focusing in immobilized pH 
gradient (IPG) gels (Bjellqvist e/fl/- 1982; Gorg. Postel and Gunther, 1988; Righetti. 
1990). Immobilized pH gradients are formed by the covalent coupling of the pH 
gradient into an acrylamide matrix, creating a gradient that is completely stable with 
time. IPG gels are usually poured onto a stiff backing film, which is mechanically 
strong and provides easy gel handling (Ostergren. Eriksson and Bjellqvist, 1988). The 
major advantages of IPG separations are that they do not suffer from cathodic drift, 
they allow focusing of basic and very acidic proteins to equilibrium, pH gradients can 
be precisely tailored (linear, stepwise, sigmoidal). and that separations over a verv 
narrow pH range are possible (0.05 pH units per cm) (Righetti. 1990; Bjellqvist et aL. 
1982. 1993a; Sinha <?/«/.. 1990; Gorg et aL 1988: Gelfi et aL 1987; Gunther et aL 
1988). However, it is not currently possible to use IPG gels to separate very basic 
proteins of isoelectric point greater than 10. although this is under development. 
Narrow pH range separations are useful to address problems of protein co-migration 
in complex samples, allowing 'zooming in' on regions of a gel [Figure 2). IPG nc\ 
strips are now commercially available, which begin to address the problems of intra- 
and inter-lab isoelectric focusing reproducibility. 

There are two means of electrophoresis for the second dimension separation of 
proteins; vertical slab gels and horizontal ultrathin gels (Gorg. Postel. and Gunther. 
1988). Both are usually SDS-containing gradient gels of approximately 1 \ 9r to 15 7 ( 
acrylamide. which separate proteins in the molecular mass range of 10 - 15()kD. A 
stacking gel is not usually used with slab gels, but is necessary when using horizontal 
gel setups (Gorg. Postel and Gunther. 1988). Comparisons have shown that there is 
little or no difference in the reproducibility of electrophoresis using either approach 
(Corbett et aL. 1994a). but commercially available vertical or horizontal precast gels 
will provide greater reproducibility for occasional users. For slab gel electrophoresis. 
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Figure 2. Two-dimensional gel electrophoresis allows 'zooming in* on areas of interest. Rincs hidilieht 
2 proteins common to each gel. f A) Wide pi range two dimensional electrophoresis map of human plasma 
proteins. First dimension separation was achei ved using an immobilised pH gradient of 3.5 to ] 0.0 units 
The second dimension was SDS-PAGE. Actual gel size was 16cm x 20cm. and proteins were visualised 
with silver staining. (B) Narrow pi range electrophoresis was used to 'zoom in* on a smalt region of the 
plasma map. The first dimension used a nauow range immobilised pH gradient of 4.2 to 5.2 units, and 
second dimension was SDS-PAGE. Micropreparative loading was used, and the gel blotted to PVDF. 
Proteins were visualised with amido black. Actual blot size was 1 6cm x 20cm. 



the use of piperazine diacrylyl as a gel crosslinker and the addition of thiosulfate in the 
catalyst system has been shown to give better resolution and higher sensitivity 
detection (Hochstrasser and Merril, 1988; Hochstrasser, Patchornik and Merril 
1988). 
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Notwithstanding the advances described ab<~ve. there is an increasing demand to 
improve the reproducibility of 2-D electrophoresis f .o facilitate database construction 
and proteome studies. Harrington et al (1993) explain that if a gel resolves 4000 
protein spots, and there is 99.59c spot matching from gel to gel. this will produce 20 
spot errors per gel. This amount of error, which might accumulate with each gel to eel 
comparison used in database construction, could produce an unacceptable desree of 
uncertainty in gel databases. To address these issues, partial automation of laree 2-D 
gel separations has been undertaken (Nokihara, Morita and Kuriki. 1992; Harrinston 
et al.. 1993). Although results are preliminary, spot to spol positional reproducibility- 
in one study was found to be threefold improved over manual methods (Harrington et 
al. 1993). It should be noted that small 2-D gel formats (50 x 43 mm) have been 
almost completely automated (Brewer et al, 1986). although these are not eenerallv 
used for database studies. 

MICROPREPARATIVE 2-D GEL ELECTROPHORESIS 

With the advent of affordable protein microcharacterisation techniques, including N- 
terminal microsequencing. amino acid analysis, peptide mass fingerprinting, phosphate 
analysis and monosaccharide compositional analysis, a new challenge for 2-D electro- 
phoresis has been to maintain high resolution and reproducibility but to provide 
protein in sufficient quantities for chemical analysis (high nanogram to low microgram 
quantities of proteins per spot). This becomes difficult to achieve with very complex 
samples such as whole bacterial cells, as the initial protein load is divided amons 2000 
to 4000 protein species. Two approaches are used for producing amounts of material 
that can be chemically characterised. The first method is to run multiple gels, collect 
and pool the spots of interest, and subject them to concentration ( Ji et al. 1 994; Walsh 
etal % 1995: Rasmussen^a/.. 1992). In this approach, the concentration process must 
also act as a purification step to remove accumulated electrophoretic contaminants 
such as glycine. A more elegant approach has been to exploit the high loading capacity 
of IPG isoelectric focusing. The high loading capacity of immobilised pH gradients 
was described early (Ek. Bjellqvist and Righetti. 1983). but has only recently been 
applied to 2-D electrophoresis (Hanash?/ a/.. 1991 ; Bjellqvist etal. 1993b). Up to 15 
mg of protein can been applied to a single gel. yielding microgram quantities of hun- 
dreds of protein species. A further benefit of this approach is that proteins present in 
low abundance, which may not be visualised by lower protein loads, are more likely 
to be detected. The use of electrophoretic or chromatographic prefractionation tech- 
niques (Hochstrasser^/ al. 1991a; Harrington etal. 1992). followed bv hi sih loading 
of narrow-range IPG separations (Bjellqvist^/ al. 1 993b) provides a likely solution to 
studies on proteins present in low abundance. 

Methods of protein detection 

There are many means for detecting proteins from 2-D gels. The method used will be 
dictated by factors including protein load on gel (analytical or preparative), the 
purpose of the gel (for protein quantitation or for blotting and chemical characterisa- 
tion), and the sensitivity required. The most common means of protein detection and 
their applications are shown in Table 1. Most detection methods have drawbacks, for 
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Table 1: O^mnn staios for 2-D gels or blots and their applications. 



Detection 
Method 



Main 

applications 



Unsuitable 
applications 



Sensni\ n\ 



References 



["S] Met or U C 
radiolabclline and 
fluorocraphy or 
phosphorimuging 

[^SJlhiourea silver 



Silver 



Cell lines. Samples that 

cultured organisms cannot be labelled 



Coomassie blue 
R-250 



Extrcmclv hish 
sensitivity gel 
staininiz 

Very high sensi- 
i.viiy gel staining, 
can be mono or 
polychromatic 

Staining of gels: 
staining of PVDF 
membranes before 
protein sequencing 



Colloidal gold 



Zinc imidazole 



Ponceau S and 
amido black 



India ink 



Stains-all 



Staining NC 
membranes, 
staining PVDF 
before direct 
MALDI-TOF 

Reverse stain ine 
of gels or mem- 
branes: mav be 
beneficial in 
MALDI-TOF 
of peptides 

Stainine hiehcr 
protein loads on 
PVDF. for protein 
sequencing or amino 
acid analysis 

Staining of 
membrane-bound 
proteins: staining 
PVDF before direct 
MALDI-TOF 

Staininn to detect 
glycoproteins or 
Ca : * binding 
proteins 



Preparative 2-D: 
PVDF or NC 
membranes 

Preparative 2-D: 
PVDF or NC 
membranes 

Staining prior to 
direct mass deter- 
mination from 
PVDF: amino acid 
analysis on PVDF: 
detection of some 
glycoproteins 

Gels 



Where positive 
image is required 



Staining prior to 
direct mass 
determination from 
PVDF 

Gel stainine: not 
quantitative from 
protein to protein 



General gel staining 



20 ppni of Gancls and Frjnza. 

radiolabci in 19S9; 

a s P° l Latham. Carre's and 

Solter. 1993 

0.4 ng protein Wallace and Saiuz. 

on spot or band 1992a.b 
of eel 

4 ng protein Rabilloud. 1992; 

on spot or Hochstrasscr and 

band of gel Mcrril. I9SS 



40 ng protein 
on band or 
spot of gel 



60 higher 
than 

coomassie 



Higher than 
coomassie 



Strupat ctaL. 1994; 
Gharahdaehi ct al., 
1992: 

Goldberg ct al.. 1988; 
Sanchez ct al.. 1992 



Yamaguchi and 
Asakawa. 1988; 
Eckcrskorn ct al.. 
1992: 

Strupat ct al.. 1994 

Ovu/. ct a L. 1992; 
James ct al.. 199^ 



100 ne 
protein on 
band or spot 
of eel 

1 — 10 ne 



Sanchez ct a!.. 1992; 
Strupat ct al.. ]W4; 
Wilkinsrr«/.. 1995. 



Li ctai.. 19S9; 
Hughes. Mack and 
Hampanan. 1988; 
Strupat ct al.. 1994 



100 ng protein Campbell, 
on band or MacLcnnan and 
spot of gel Jorgcnscn. 1983: 

Goldberg ct al.. 1988 



PVDF = polyvinylidene d.tluondc. NC = nitrocellulose: MALDI-TOF = matrix ass,sted Ucr desorpi.nn lonisai.on tunc 
ot flight mass speciromeiry. 

example, some glycoproteins are not stained by coomassie blue (Goldberg a al.. 
1 988 ). and many organic dyes are unsuitable for protein detection on PVDF if samples 
are to be used for direct matrix-assited laser desorption ionisation mass spectrometry 
(Strupat eial.. 1994). 

Although most means of protein detection give some indication of the quantities of 
protein present, in general they cannot be used for global quantitation. This is because 
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no protein stain is able consistently to detect proteins over a u Me range of concentra- 
tions, isoelectric points and amino acid compositions, and with a variety of 
post-translational modifications (Goldbergs; al. 1988: Li etal.. 1989). Furthermore, 
there are large differences in staining pattei n when identical gels or blo^s are subjected 
to different stains, including amido black, imidazole zinc, india ink. ponceau S. 
colloidal gold, or coomassie blue (Tovey. Ford and Baldo. 1987; Ortiz et <//.. 1992). 
The ir.oM common means of quantitating large numbers of protein % in a 2-D gel 
involves the radiolabelling of protein samples prior to electrophoresis, and protein 
quant : tation based on fluorography and image analysis or liquid scintillation countine 
(Garrels, 1989; Celis and Olsen. 1994). However, proteins which do not contain 
methionine cannot be detected if only ["S] methionine is used for labelling. Amino 
acid analysis of protein spots visualised by other techniques presents a likely means of 
protein quantitation for the future. 

BLOTTING OF PROTEINS TO MEMBRANES 

Electrophoretic blotting of proteins from two-dimensional polyacrylamide gels to 
membranes presents many options for protein identification and microcharactcrisation 
which are not possible when proteins remain in gels. For example, when proteins are 
blotted to polwinvlidene difluoride (PVDF) membranes, thev can be identified bv N- 
terminal sequencing, amino acid analysis, or immunoblotting, or they may be subjected 
to endoproteinase digestion, monosaccharide analysis, phosphate analysis, or direct 
matrix-assisted laser desorption ionisation mass spectrometry (Matsudaira. 1987; 
Wilkins?/*//.. 1995; Jungblut 1994; Sutton m//.. 1995; Rasmussenm//.. 1994; 
Weizthandler et <//., 1993; Murthy and Iqbal. 1991; Eckerskorn et a!.. 1992). It is 
possible to combine of some of these procedures on a single protein spot on a PVDF 
membrane (Packer et al.. 1 995; Wilkins et al., submitted; Weizthandler et <//., ! 993 ). 
This is useful when minimal amounts of protein are available for analysis. These 
techniques will be explored in detail later in this review. Notwithstanding the above, 
there are some disadvantages associated with blotting of proteins to membranes. 
There is always loss of sample during blotting procedures ( Eckerskorn and Lottspeich. 
1993). and common protein detection methods are less sensitive or not applicable to 
membranes (Table /). presenting difficulties for the analysis of low abundance 
proteins. Detailed discussion of the merits of available membranes and common 
blotting techniques can be found elsewhere ( Eckerskorn and Lottspeich. 1 993; Strupat 
etaL. 1994; Patterson. 1994). 

2-D gel analysis, documentation, and proteome databases 

Following protein electrophoresis and detection, detailed analysis of gel images is 
undertaken with computer systems. For proteome projects, the aim of this analysis is 
to catalogue all spots from the 2-D gel in a qualitative and if possible quantitative 
manner, so as to define the number of proteins present and their levels of expression. 
Reference gel images, constructed from one or more gels, form the basis of two- 
dimensional gel databases. These databases also contain protein spot identities and 
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dv tails of their po-t-transiational modifications. 2-Dgel databases are beginninr to be 
linked to or integrated with comprehensive protein and nucleic acid databases 
(Neidhardt et a!.. 1989; Simpson et aL 1992; Appel et aL. 1994). and 'organism* 
databases, containing DNA sequence data, chromosomal map locations, reference 2- 
D gels and protein functional information for an organism, are becoming established 
as genome and proteome projects progress (VanBogelen et aL. 1992; Yeast Protein 
Database cited in Garrels e: aL. 1994). 



GEL IMAGE ANALYSIS AND REFERENCE GELS 

After 2-D electrophoresis and protein visualisation by staining, fluorosraphy or 
phosphorimaging, images of gels are digitised for computer analysis by an imaee 
scanner. laser densitomer, or charge-coupled device (CCD) camera (Garrels. 1989; 
Celis er aL. 1990a; Urwin and Jackson, 1993). All systems digitise gels with a 
resolution of 100 - 200 mm. and can detect a wide range of densities or shadine (256 
or more 'grey scales'). Following this, gel images are subjected to a series ofmani- 
pulations to remove vertical and horizontal streaking and background haze, to detect 
spot positions and boundaries, and to calculate spot intensity (Figure 3). A standard 
spot (SSP) number, containing vertical and horizontal positional information, is 
assigned to each detected spot and becomes the protein's reference number. Table 2 
lists some notable software packages which process 2-D sel imases. 



Table 2: Some Software Packages for the Analvsis of Gel Imaces. 



Gel Image Analysis System References* 



ELSIE4&5 Olscn and Miller. 1988; Wirth et aL. 1991 ; Winh vt aL. 1993. 

GELLAB 1 & II Wu. Lcmkin and Upton. 1993; Lemkm. W u and Upton. 199V 

Mynck et aL. 1993. 

MELANIEI& II Appei.m//. 199 1 ; Hochstrasser cr aL 1991b. 

QUEST I & II and PDQUEST Garrels. 1989; Monardo ct aL 1 994; Holt eta!.. 1992 Cells et ai 

I990a.b. 

TYCHO & KEPLAR Anderson ct aL. 1984; Richardson. Horn and Anderson. W4. 



* These references arc not exhaustive; they include some references of use as well as authors of the 
s\ stem. 



As there are difficulties in the electrophoresis of samples with 10()7r reproducibil- 
ity, reference gel images are often constructed from many gels of the same sample 
(Garrels and Franza, 1989; Neidhardtm//.. 1989). Since this involves the matching of 
2000 to 4000 proteins from one gel to another, it presents a considerable challenge to 
image analysis systems. Matching of gels is usually initiated by an operator, who 
manually designates approximately 50 or so prominent spots as 'landmarks' on eels 
to be cross-matched. Proteins which match are then established around landmarks, 
using computer-based vector algorithms to extend the matching over the entire gel. 
Close to 100% of spots from complex samples can be matched by these methods, 
although different degrees of operator intervention may be required (Olsen and Miller, 
1988: Lemkin and Lester, 1989; Garrels, 1989; Myrick et aL, 1993). 
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Figure 3. Computer processing of gel images. Shown is a wide pi range 2-D separation of human liver 
proteins, processed by Melanie software {Appcl et ai. 1991 h (A) Original gel image as captured by laser 
densitometer. (B) Gel image after processing to remove streaking and background. (C) Outline definition 
of all spots on the gel. 
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CALCULATION OF PROTEIN ISOILECTRir POINT AND MOLECULAR WEIGHT 

Estimation of the isoelectric point (pi; and molecular weight (MW) of proteins from 
2-D gel., p:ovides fundamental parameters for each protein, which arc also of use 
during identification procedures (see f blowing section). The pi and MW of proteins 
are recorded in 2-D gel databases. Accurate estimations of protein pi and MW can be 
obtained by using 20 or more known proteins on a reference map to construct standard 
curves of pi and molecular weight, which are then used to calculate estimated pi and 
MW of unknown proteins (Neidhardt et cil.. 1989: Garrels and Franza. 1989; Yan- 
Bogelen. Hutton and Neidhardt. 199C; Anderson and Anderson. 1991; Anderson er 
ah. 1991: Latham et al. 1992). Alternatively, the MW of individual proteins blotted 
to PVDFcan be determined very accurately by direct mass spectromctn (Eckerskorn 
et al.. 1992). Where immobilised pH gradients are used, the focusing position of 
proteins allows their pi to be measured within 0.15 units of that calculated from the 
amino acid sequence (Bjellq vist et al.. 1 993c ). It must be noted, however, that proteins 
earning post-translational modifications may migrate to unexpected pi or MW 
positions during electrophoresis (Packer et al.. 1995). 



SPOT QUANTITATION AND EXPRESSION ANALYSIS 

A major challenge faced in proteome projects is the quantitative analysis of proteins 
separated by 2-D electrophoresis. The most accurate means of protein quantitation is 
to determine chemically the amount of each protein present by amino acid com- 
positional analysis. However, the current method of choice for quantitative analysis 
of many proteins is to radiolabel samples with [ : 'S] methionine or IJ C amino acids, 
perform the 2-D electrophoresis, and measure protein levels in disintegrations per 
minute (dpm) or units of optical density. Quantitation is achieved either by liquid 
scintillation counting, or by gel image analysis where spot densities are quantified 
by reference to gel calibration strips containing known amounts of radiolabel led 
protein or against the integrated optical density of all spots visualised ( Vandekerkhove 
et al.. 1990: Celis et al.. 1990b: Celis and Olsen. 1994: Garrels. 1989: Latham 
Garrels and Solter, 1993: Fey eta!.. 1994). All approaches effectively allow spots to 

disintegrations per minute loaded onto the "el 
Limitations that remain with radiolabelling methods are that absolute quantitation is 
not achieved because all proteins have varying amounts of any amino acid, and that 
only easily labelled samples can be investigated. Quantitative silver staining presents 
an alternative (Giometti et al.. 1991: Harrington et al.. 1992: Rodriguez etal. 1993; 
Myrick etal. 1993). which when undertaken with [ ,s SJthiourca (Wallace and Saluz. 
1992 a.b) is of extremely high sensitivity. 

When protein spots from samples prepared underdiffcrcnt conditions are quantitated 
and matched from gel to gel. it becomes possible to examine changes and patterns in 
protein expression. Large scale investigation of up- and down-regulation of proteins, 
their appearance and disappearance, can be undertaken. For example, simian virus 40 
transformed human keratinocytes were shown to have 177 up-regulated and 58 down- 
regulated proteins compared to normal keratinocytes (Celis and Olsen. 1 994 ); detailed 
synthesis profiles of 1 200 proteins have been established in 1 to 4 cell mouse embryos 
(Latham etal.. 1991. 1992): and 4 proteins out of 1971 were found to be markers for 
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cadmium toxicity in urinary proteins (Myrick et al„ 1993). Complex global changes 
in protein expression as a result of gene disruptions have also been investigated ( S. Fey 
and P. Most.-Larsen. Personal communication). Impressively, large gel sets showing 
prcuein expression under different conditions can be globally investigated using 
stat stical n ethods that find groups of related objects within a set. For example, the 
REF52 rat c jll line database, consisting of 79 gels from 1 2 experimental groups where 
each gel contains quantitative data for 1 600 cross-matched proteins, has hcen analysed 
by cluster analysis (Garrels et al.. 1990). This revealed clusters of proteins that, for 
example, were induced or repressed similarly under simian virus 40 ov adenovirus 
transformation, suggesting a common mechanism. Protein groups thai were induced 
or repressed during culture growth to confluence were also found. It is obvious that the 
potential for investigation of cellular control mechanisms by these approaches is 
immense. It is equally clear that investigations of gene expression of this scale are 
currently technically impossible using nucleic-acid based techniques. 



Table 3: Some proteomc databases and their special features 



Proteinic database 



Special features 



References 



E. coli gene-protein database 



Human heart databases 



Human keratinoevtc database 



Mouse embrvo database 



Mouse liver database 
< Argonnc Protein 
Mapping Group) 

Rat liver epithelial database 
Rat liver database 



REF 52 rat cell line database 



SWISS-2DPAGE containing 
human reference maps 



Yeast Protein Database (YPD) 
and Yeast Electrophoretic 
Protein Database (YEPD) 



Gel spots linked with GcnBank 
and Kohara clones: quantitative 
spot measurements under differ- 
ent growth conditions 

Identification of disease markers; 
two separate databases have 
been established 

Extensive identifications: 
quantitative spot measurements 
of transformed cells: identifica- 
tion of disease markers 

Quantitative spot 
measurements through 
1 to 4 cell static 

Documents chances due to 
exposure to ionizing radiation 
and toxic chemicals 

Detailed subcellular 
fractionation studies 

Extensive studies on regulation 
of proteins by drugs and toxic 
agents 

Accessible via World Wide Web: 
quantitative spot measurements 
under different conditions 

Accessible via World Wide Web; 
completely integrated with 
SWISS-PROT and 
SWISS-3DIMAGE 

Completely crossrcfcrcnced 
organism database: YPD has 
extensive information on over 
3500 proteins: YEPD has 
manv identifications 



VanBogclcn and Ncidhardi. 1991 
VanBogclcn et ai. 1992 



Baker et al.. 1992 
Corbett etai. 1994b 
Jungblut et ai. 1994 

Cclis etai, 1990a 
Celis a al. 1993 
Ceh> and Olscn 1994 

Latham etai. 1991 
Latham a al.. 1992 

Giometli. Tavlor and Tnllaksen. 1992 



Winh etai.. 1991 Wirth eta!.. 1993 

Anderson and Anderson. 1991: 

Anderson et a I.. 1992: 

Richardson. Horn and Anderson. 1944 

Garrels and Fran/a 19X9 
Boutell etai. 1994 

Appcl et al.. 1993 
H oc hstrasser et a I.. 1992 
Hughes etai. 1993 
Gola/ etai. 1993 

Gam? I.n etai. 1994 
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FEATURES OF PROTEOME DATABASES 

Proteome projects rely heavily on computer databases to store information about all 
proteins expressed by an organism. 'Proteome databases* should contain detailed 
information of proteins already characterised elsewhere, as well as protein data from 
2-D gels such as apparent pi and MW. expression level under different conditions, 
subcellular localisation, and information on post-translational modifications. Images 
of reference 2-D gels, shewing protein SSP numbers and protein identifications, 
should also be included. Ideally, proteome databases should be accessible with 
Macintosh or IBM persona! computers and easy to use. Some proteome databases and 
the areas they cover are l : sted in Table 3. Databases range from collections of 
annotated gels to large databases of images integrated with protein and nucleic acid 
sequence banks. 

One example of an integrated proteome database is the suite of SWISS-PROT. 
SWISS-2DPAGE and SWISS-3DIM AGE databases ( Appel et <//., 1 993; Appel et <//.. 
1994; Appel. Bairoch and Hochstrasser, 1994; Bairoch and Boeckmann. 1994). The 
features of these three databases are listed in Table 4. SWISS-PROT. SWISS- 
2DPAGE and SWISS-3DIMAGE are accessible through the World Wide Web 



Table 4: The SWISS-PROT. SWISS-2DPAGE and SWISS-3DIMAGE suite of emsslinkcd databases. 
All three databases are accessible through the World Wide Web. at URL address: hup:// 
expasy.hcuge.ch/ 





SWISS-PROT 


SWISS-2DPAGE 


SWISS-3DIMAGE 


Information 


Text entries of sequence data; 2-D gel images of: human 


Collection of 330 3-D 




Citation information: 


liver, plasma. HcpG2. HepG2 


images of proteins 




laxonomic data; 38. 303 


secreted proteins, red blood cell. 






entries in Release 29 


lymphoma, cerebrospinal fluid. 








macrophage like cell line. 








eryihrolcukemia cell, platelet 




Annotations 


Protein function: 


Gel imases where 


Alt annotation is 




Post t ran s 1 at ion a 1 


protein is found; 


available in SWISS- 




modifications: 


How protein identified: 


PROT 




Domains: 


Protein pi and MW; 






Secondary structure: 


protein number; 






Quaternary structure; 


normal and pathological 






Diseases associated 


variants 






with protein; 








Sequence conflicts 






Cross- 


SWISS-2DPAGE 


SWISS-PROT and all 


SWISS-PROT and all 


Refcrenced 


SW1SS-3DIMAGE 


other databases 


other databases 


Databases 


EMBL: PIR: PDB; 


accessible through 


accessible through 




OMIM; PROSITE: 


SWISS-PROT " 


SWISS-PROT 




Medline: Flybase: 








GCRDb: MaizeDB: 








Worm Pep: DictyDB 






Other Features Navigation to other 


Gel images show position 


Mono and stereo 




SWISS databases achieved 


of identified proteins, or 


images available; 




bv selecting entries with 


region of gel where protein 


Imaecs can be 




computer mouse 


should appear 


transferred to local 



computer image 
viewing programs 
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< Bemers-Lee er <//.. 1 992 ). allowing any computer connected to the internet to access 
the stored information and images. Navigation within and betweer the three databases 
is seamless, as all potential crosslinks are highlighted as hypertext on the display and 
car be selected with a computer mouse. From these databases, detailed information 
abc ut a protein, including amino acid sequence and known post-franslationai modifi- 
cations, can be obtained, the precise protein spot it corresponds io on a reference eel 
irru ge can be viewed if known, and the 3-D structure of the molecule can be seen if 
available. References to nucleic acid and other databases are also given to provide 
access to information stored elsewhere. 

Organism' databases, containing detailed protein and nucleic acid information 
about a species, are becoming common as genome and proteome projects progress. 
The.>e differ from nucleic acid or protein sequence databases like GenBank or S WISS- 
PROT because they are image based, and contain information about chromosomal 
map positions, transcription of genes, and protein expression patterns. The Es- 
cherichia coli gene-protein database (VanBogelen. Hutton and Neidhardt. 1990; 
VanBogelen and Neidhardt. 1991. VanBogelen er <?/.. 1992), known as the 
EC02DBASE, is one example. It contains gene and protein names, 2-D eel spot 
information (including pi and MW estimates, and spot identification), senetic infor- 
mation (GenBank or EMBL codes, chromosomal location, location on Kohara clones 
(Kohara. Akiyama. and Isono. 1987), transcription direction of genes), and protein 
regulator)' information (level of protein expression under different growth reeimes. 
member of regulon or stimulon). All entries in the EC02DBASE are also cross- 
referenced to the SWISS-PROT database (Bairoch and Boeckmann. 1994). It is 
anticipated that organism databases will soon become a standard means of storing all 
available information about a particular species. However there is currentlv no 
consistent manner in which organism databases are assembled, which may hamper 
comparisons in the future. 



Identification and characterisation of proteins from 2-D gels 

The number of proteins identified on a 2-D reference map determines its usefulness as 
a research and reference tool. As most reference maps have only a small proportion of 
proteins identified, a major aim of current proteome projects is to screen many proteins 
from 2-D maps, in order to define them as 'known' in current nucleic acid and protein 
databases, or as 'unknown'. Protein identification assists in confirmation of DNA 
open reading frames, and provides focus for DNA sequencing projects and protein 
characterisation efforts by pointing to proteins that are novel. Since there mav be 
3000-4000 proteins from a single 2-D map that require identification, the challenge in 
protein screening is to identify proteins quickly, with a minimum of cost and effort. 

Traditionally, proteins from 2-D gels have been identified by techniques such as 
immunoblotting. N-terminal microsequencing. internal peptide sequencing, 
comigration of unknown proteins with known proteins, or by overexpression of 
homologous genes of interest in the organism under study (Matsudaira. 1987; Rosenfeld 
era!.. 1992; VanBogelen er aL 1992; Celis er ai. 1993: Honore <*/ <//.. !993;Garrcls 
et <//., 1994). Whilst these techniques arc powerful identification tools, they are loo 
expensive or time and labour intensive to use in mass screening programs. A 
hierarchical approach to mass protein identification has been recently suggested as an 
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Table 5: Hierarchical analysis for mass screening of 2-D separated proteins blotted to memhnncs 
Rapid and inexpensive lec m.ques arc used as a first step in protein identification, and slower more 
expensive techniques are then used if necessar\ . Table modified fi >m Wasincer cml , |v*js. 



Order identification technique 



5 
6 
7 

8 



References 



Amino acid ana vsis 



Amino acid analysis with N-lcrminal sequence tan 
Peptide-mass fingerprinting 



Combination of amino acid analysis and peptide 
mass fingerprinting 

Mass spectrometry sequence lag 

Extensive N-tcrmina! Edman microsequencing 

Internal peptide Edman microsequencing 

Microsequencing by mass spectrometry (electro- 
spray ionisaiion. post-source decay MALDI-TOF) 

Ladder sequencing 



iunghlut ctaL 1^92: Shaw. \ wv 
Hohohm. Houthacvc and Sander. 1994; 
Jungblut ctaL. IW; \Vilkm> ctaL. 1^5 
Wilkins ct al.. submitted 

Hcnzcl etaL. 1993; Pappin. Hoirup and 
Bieashy. 1993; James ct aL. |M9V 
Mann. Hojrup and Rocpsiorff. 1993; 
Yates <■/«/.. 1993: Mori/ c; r//.. 1994; 
Sutton ct at.. 1995 

Cord well ctaL. 1995; 
Wasinger era!.. 1995; 

Mann and Wilm. 1994 

Matsudaira. 1987 

Roscnfeld cr aL. 1992; 
Hellman ct al.. 1995; 

Johnson and Walsh. 1992 
Bartlct-Jones ct al.. IW 



alternative to traditional approaches ( Table 5: Wasinger««/.. 1 995). This involves the 
use of rapid and cheap identification tools such as amino acid analysis and peptide 
mass fingerprinting as first steps in protein identification, followed by the use of 
slower, more expensive and time consuming identification procedures if necessary. In 
the construction of this hierarchy the analysis time, cost per sample and the complexity 
of the data created has been considered, as whilst some techniques require little 
machine time per sample, the analysis of data can be quite involved and time 
consuming. Amino acid analysis and peptide mass-fingerprinting based identification 
techniques in the hierarchy are discussed in detail below. For review of other protein 
identification techniques in Table 5. see Patterson (1994) and Mann (1995). 

PROTEIN IDENTIFICATION BY AMINO ACID COMPOSITION 

There has been a revival of interest in the use of amino acid composition for 
identification of proteins from 2-D gels after early work by Eckerskorn ei al. < 1988). 
This technique uses a protein's idiosyncratic amino acid composition profile in order 
to identify it by comparison with theoretical compositions of proteins in databases. 
The amino acid composition of proteins can be determined by differential metabolic 
radiolabelling and quantitative autoradiography after 2-D electrophoresis (Garrels et 

membrane-blotted proteins and 
chromatographic analysis of the resulting amino acid mixture (Eckerskorn et al.. 
1988: Tous <?/«/.. 1989: Gharahdaghiero/.. 1992: JungblutpM/.. 1992: Wilkins?/ al.. 
1995). As differential metabolic labelling experiments require X-ray film or phos- 
phor-image plate exposures of up to 140 days, and can only be undertaken with easily 
radiolabeled samples, the technique is not as rapid or widely applicable as chromato- 
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Spot EC0LI-B1M 



Composition 



Asx : 


13 .2 


Glx: 


10.4 


Ser : 


5.7 


His: 


0.7 




Gly : 


5.4 


Thr : 


3.8 


Ala: 


6.7 


Pro : 


7.9 




Tyr : 


1.3 


Arg: 


5.0 


Val: 


8.0 


Met: 


0.3 


G :y 


lie: 


5.9 


Leu : 


8.0 


Phe: 


13 .3 


Lys : 


4.4 


T, r 



pi estimate: 6.89 
Mw estimate: 16800 



Range searched: ( 6.64, 7.14) 
Range searched: (13440, 20160) 



Closest SWISS-PROT entries for the species ECOLI matched by AA composition: 

Description 

ASPARTATE CARBAMOYLTRANSrERASE 

PANTOTHENATE KINASE (EC 2.7.1.33) 
HOMOSERINE O- SUCC INYLTRANSFERASE 
TRANSCRIPTIONAL ACTIVATOR CADC . 
HEMOLYSIN C, PLASMID. 



Rank 


Score 


Protein 


Pi 


Mw 


1 


24 


PYRI_ECOLI 


6.84 


16989 


2 


39 


COAA_ECOLI 


6.32 


36359 


3 


40 


META_ECOLI 


5.06 


35713 


4 


42 


CADC_ECOLI 


5.52 


57812 


5 


43 


HLYC_ECOLI 


8.58 


19769 


Closest SWISS-PROT entries for 


ECOLI 


range 


* 








Rank 


Score 


Protein 


Pi 


Mw 


1 


24 


PYRX_ECOIiI 


6.84 


16989 


2 


102 


TRJ8_ECOLI 


6.73 


17921 


3 


112 


YAJG_ECOLI 


6.79 


19028 


4 


140 


YFJB_ECOLI 


6.83 


14945 


5 


142 


YAHA. ECOLI 


7.06 


14726 



Description 



ASPARTATE CARBAMOYLTRAMSrERASE 

TRAJ PROTEIN. 

HYPOTHETICAL LIPOPROTEIN YAJG. 
HYPOTHETICAL 14.9 KD PROTEIN IN GRPE 
HYPOTHETICAL PROTEIN IN BETT 3 * REGION 



Figure 4. Computer printout from ExPASy server where the empirical amino acid composition, 
estimated pi and MW of a protein from a 2-D reference map of £. coli were matched against all entries in 
SWISS-PROT for£. coli. The correct identification, aspartate carhamo) Itranslerasc. is shown in bold. Low 
scores indicate a good match. Note how matching within a defined pi and MW range ( lower set of proteins i 
has greatly increased the score difference between the first and second ranking proteins. This score 
difference gives high confidence in the identification, and is only observed where the top ranking protein 
is the correct identification (Wilkins cr at.. 1995). 

graphy-based analysis. Proteins blotted to PVDF membranes can be hvdrolysed in I h 
at 155°C. amino acids extracted in a single brief step, and each sample automatically 
derivatised and separated by chromatography in under 40 minutes (Wilkins et aL. 
1 995: Ou etaL. 1995). In this manner, one operator can routinely analyse 100 proteins 
per week on one HPLC unit. This technology lends itself to automation, and it is 
anticipated that instruments with even greater sample throughput will be developed. 
When proteins have been prepared by micropreparative 2-D electrophoresis (Hanash 
et aL. 1991; Bjellqvist et al., 1993b). blotted to a PVDF membrane and stained with 
amido black, any visible protein spot is of sufficient quantity for amino acid analysis 
(Cordwell 1995; Wasinger et aL. 1995; Wilkins et aL. 1995). 

After the amino acid composition of a protein has been determined, computer 
programs are used to match it against the calculated compositions of proteins in 
databases (Eckerskorn et ai. 1988; Sibbald. Sommerfeldt and Argos, 1991 : Jungblut 
et al. 1992; Shaw. 1993: Hobohm. Houthaeve and Sander, 1994; Wilkins et aL. 
1995). Matching is usually done with only 15 or 16 amino acids, as cysteine and 
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Spot ECOLI-ACJ 



Composition 



A3X: 


9 




Glx: 


10 


.8 


Ser : 


4. 


1 


His : 


2 


.7 


Gly: 


12 


.2 


Thr : 


3 


.8 


Ala: 


11. 


9 


Pro : 


3 


.2 


Tit : 


6 




Arg : 


3 


.7 


Val : 


9. 


5 


Met : 


0 


.6 


lie: 


5 


. (. 


Leu: 


8 


.2 


Phe: 


3. 


2 


Lys : 


4 


.9 



pi estimate: 5.99 Range searched: ( 5.74, 6.24) 
Mw estimate: 45000 Range searched: (36000, 54000) 

Closest 3WISS-PROT entries for ECOLI with pi and Mw values in specified 
range : 



ink 


Score 


Protein 


Pi 


Mw 


N- terminal 






















1 


21 


GLYA__ECOLI 


6.03 


45316 


M 


L 


K 


R 


£ 


2 


32 


YJGB_ECOLI 


5.86 


36502 


M 


S 


M 


I 


K 


3 


38 


GABT_ECOLI 


5.78 


45774 


M 


s 


N 


S 


K 


4 


44 


YIHS_ECOLI 


5.86 


48018 


M 


R 


I 


K 


Y 


5 


45 


DHE4_ECOLI 


5 . 98 


48581 


M 


D 


Q 


T 


Y 


6 


46 


ARGD_ECOLI 


5.79 


43765 


M 


A 


I 


E 


Q 


7 


46 


MORB.ECOLI 


5.78 


37851 


M 


N 


H 


S 


L 


8 


47 


GLMU_ECOLI 


5.98 


49162 


M 


L 


N 


N 


A 


9 


47 


ACKA_ECOLI 


5.85 


43290 


M 


S 


S 


K 


L 


10 


50 


YJJN_ECOLI 


6.01 


37064 


M 


E 


S 


R 


I 



Figure 5. A PVDF protein spot from an E. coli 2-D reference map was sequenced for 4 cycles, and ilic 
same sample then subject to amino acid analysis. The N-terminal sequence was M LKR. When the amino 
acid composition of the spot, as well as estimated pi and MW. were matched against all entries in SWISS- 
PROT for£. coli. the above list of best matches was produced. N-terminal sequences arc from SWISS-PROT 
for those entries. The top ranking identification of serine hydroxymethyltransferase (bold \ did not show a 
large score difference between the first and second ranking proteins, giving little confidence in this heme 
the correct protein identification. However, the sequence tag (M L K R) confirmed the identity of the 
protein as serine hydroxymethyltransferase. 



tryptophan are destroyed during hydrolysis, asparagine and gJutaminc are dcaniidated 
to their corresponding acids, and proline is not quantitated in some analysis systems. 
The computer programs produce a list of best matching proteins, which are ranked by 
a score that indicates the match quality. Some programs allow matching to be 
restricted to specific 'windows' of MW and pi (Hobohm. Houihacve and Sander, 
1994: Wilkins ei al.. 1995). and to protein database entries for one species (Jumiblut 
el al.. 1992; Wilkins et aL. 1995). The use of such restrictions increases the power of 
matching. An example of protein identification by amino acid composition is shown 
in Figure 4. To date, amino acid composition has been used to identify proteins from 
reference maps of Spiroplasma melliferum. Mycoplasma t>emtalium. £. coli. Saccha- 
romyces cerevisiae. Dicnostelium discoideiun. human sera, human heart, human 
lymphocyte, and mouse brain (Cordwell et al.. 1995: Wasinger et al.. 1995: Wilkins 
et al.. 1995: Jungblut et ai. 1992, 1994: Garrels et al.. 1994: Freyr/ a!.. 1994). 

PROTEIN IDENTIFICATION BY AMINO ACID COMPOSITION AND N-TERMINAL 
SEQUENCE TAG 

When samples from 2-D gels are not unambiguously identified by amino acid 
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c >mposition. pi and MW. often (he correct identification, of that protein is anions the 
i-jp rankings of the list ( Hobohm. Houthaeve and Sander. 1 994: Cordwell et al 1 995- 
Wilkins et al.. 1995). Taking advantage of this observation. we have used the mass 
spectrometry sequence tag" concept (Mann and Wil n . 1 994 1 in developing a com' 
lined Ed man degradation and amino acid analysis approach to protein identification 
( Vv ilkins et al.. submitted ). This involves the N-termin..| sequencing of PVDF-hlotted 
proteins by Edman degradation for 3 or 4 cycles to create a -sequence la*' followin« 
which the same sample is used for amino acid analysis. As onlv a few amino acids are 
amoved from the protein, its composition is not significant!- altered Furthermore 
since only a small amount of protein sequence is required, fast but low repetitive yield 
L-Jman degradation cycles can be used. Modifications to current procedures should 
allow 3 cycles to be completed in 1 h. thereby allowing the screening of 100 or more 
proteins per week on one automated, multi-cartridge sequenator. Ammo acid compo 
s.t.on. pi and MW of proteins are matched against databases as described above and 
N-termmal sequences of best matching proteins are checked with the 'sequence t ia- 
to confirm the protein identity (Figure 5). This technique will be less useful when 
proteins are N-terminally blocked, but as only a few N-tcrminal amino acids are 
susceptible to the acetyl, formyl. or pyroglutamyl modifications that cause blockade 
this may itself provide useful information for sequence tac identification A strength 
of N-termmal sequence tag and amino acid composition protein identification is that 
data generated are quickly and easily interpreted. 



PROTEIN IDENTIFICATION BY PEPTIDE MASS FINGERPRINTING 

Techniques for the identification of proteins by peptide mass finaerprintino have 
recently been described (Henzel et al.. 1993: Pappin. Hojrup and Bleasbv °\ 993- 
James m,/.. 1993: Mann. Hojrup and Roepstorff. 1993: Yates et al.. 199V Mortz 
al.. 1 994: Sutton et al.. 1 995 1. This involves the generation of peptides from proteins 
using residue-specific enzymes, the determination of peptide masses, and the nruch- 
mg of these masses against theoretical peptide libraries cenerated from protein 
sequence databases. As proteins have different amino acid sequences, their peptides 
should produce characteristic 'fingerprints'. 

The first step of peptide mass fingerprinting is protein digestion. Proteins within the 
gel matrix or bound to PVDF can be enzymatically digested /„ ,///,. although „M,7„»el 
digests arc reported to produce more enzyme autodigestion products, which compli 
cate subsequent peptide mass analysis (James et al.. 1993: Rasmussen et al 1994- 
Mortz ,/ al.. 1994). The enzyme of choice for digestion is currently trypsin (of 
modified sequencing grade), but other enzymes (Lys-C orS. aureus V8 protease) have 
also been used (Pappin. Hojrup and Bleasbv. 1993). To maximise the number of 
peptides obtained, it is desirable for protein samples to be reduced ajid alkylated prior 
to digestion (Mortz et al. 1994: Henzel et al.. 1993). This ensures that all disulfide 
bonds of the protein are broken, and produces protein conformations that are more 
amenable to digestion. Surprisingly, chemical digestion methods such as cvano-cn 
brom.de (methionine specific), formic acid (aspartic acid specific) and 
n.trophenylsuIfenyl)-3-methyl-3'-bromo.ndolenine (tryptophan specific) have not- 
been explored as means of peptide production for mass fingerprinting even thouch 
they are rap.d and may circumvent some problems associated with enzvme dictions 
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(Nikodem and Fresco. 1979: Crimmins era!.. 1990; Vanfleieren et a!,. 1992). 

After proteins are digested, peptide masses are determined by mass snecirorueirv. 
Direct analysis ofpepp.de mixtures can be achieved by electrospray ionisation mass 
spectrometry, plasma d^sorption mass spectrometry, or matrix assisted laser dc sorption 
ionization (MALDI) m tss spectrometry techniques. MALDI is preferable because of 
its higher sensitivity and greater tolerance to contaminating substances from 2-D eels 
(James era!.. 1993; Mortz et aL. 1994; Pappin. Hojrup and Bleasby, 1993). Further- 
more, recent modifications to sample preparation methods have largely solved early 
difficulties experienced with the calibration of MALDI spectra (Monz et aL. 1994- 
Vorm and Mann. 1994; Vorm. Roepstorff and Mann, 1994). The high sensitivity of 
mass spectrometry allows a small fraction of a digest of a lug protein spot to be used 
for analysis, and analysis itself is complete in a few minutes. 

A major challenge associated with peptide mass fingerprinting is data interpretation 
prior to computer matching against libraries of theoretical peptide digests. Spectra 
must be examined carefully to determine which peaks represent peptide masses of 
interest, as there are often enzyme autodigestion products and contaminating sub- 
stances present (Henzel et aL. 1993; Mortz et aL. 1994; Rasmussen er aL. ]994). 
Furthermore, if protein alkylation and reduction has not been undertaken prior to 
protein digestion, peptide sequence coverage may be poor (407r to 709r ), with some 
masses present representing disulfide bonded peptides originally present in the protein 
( Mortz et aL. 1 994 ). For eukaryotes. a serious issue is the alteration of peptide masses 
by the presence of post-translational modifications (Table 6). The mass of the 
unmodified peptide alone can be very difficult to determine. Two artifactual modifi- 
cations introduced by electrophoresis, an acrylamide adduct to cysteine and the 
oxidation of methionine, are also known to alter peptide masses < le Maire et aL. 1 993- 
Hess ctaL. 1993). 

Table 6: Masses of some common post-translational modifications. Peptides carrying post- 
translational modifications complicate data analysis for peptide mass lln^crpriniini: protein 
identification. This is especially so for protein izlycosylation. which invohes main" different 
combinations of the hexosamines. hexoscs. dcoxyhexoses. and sialic acid. 

Post-translational modification Mass change 



Acety lation + ^ ^ 

"Acrylamide adduct to cysteine + 7j ^ 

Carboxylation of Asp or Glu + 44 ^1 

Deamidation of Asn or Gin + () ^ 



Disulfide bond formation 



- 2.02 



Dcoxyhexoses (Fuel 14^ 14 

Formylation + ^ () j 

Hexosamincs (GlcN. GaIN) + j^j | (> 

Hexoscs (Glc. Gal. Man) + 14 

H\ droxylation + | ^ ^ 

N-acctylhcxosamincs (GlcNAc. GalNAci 19 

*Oxidation of Met + IMM) 

Phosphorylation + 79 

Pyrogluiamie acid fonned from Gin - 1 7 

Sialic acid (NeuNAc) + ">9| ^6 

Sulfation ; K0 J )b 



Table modified from Finnigan LASERMAT application data sheet 5. 

Asterisk - shows modifications that can arise artifactually from the 2-D electrophoresis process. 
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A number of computer programs are available for matching peptide masses aeainst 
databases (reviewed in Cottrell. 1994). Matching is usually undertaken in an interac- 
tive manner, whereby peaks of mass 500-3000 Da are selected and matched under 
various search parameters including MW of protein, mass accuracy of peptides, and 
number of missed enzyme cleavages allowed (Henzd eta!.. 1993; Mortz craL. 1994: 
Rasmussen et «/.. 1 994 ). The correct protein identity is the protein which has the most 
peptide masses in common with the unknown sample. Identities have been established 
with as few as three peptides, but unambiguous identificaticn is thought to require a 
mass spectrometric map covering most peptides of the protein (Mortz et <//.. 1994; 
Yates et al.. 1993). To date, peptide mass fingerprintirg of proteins has been 
undertaken from the human myocardial protein and keratinocyte maps, from anE. coli 
2-D gel. and from reference maps of Spiroplasma mellijcruni and Mycoplasma 
genitalium (Sutton et al„ 1995; Rasmussen?/ «/., 1994; Henzel <7 <//.. 1993; Cordwell 
et al.< 1995. Wasinger et al„ 1995). although the technique is most powerful when 
used in combination with another protein identification technique (Rasmussen et 
1994; Cordwell et ai. 1995). 

MASS SPECTROMETRY SEQUENCE TAGGING 

An extension of peptide mass fingerprinting has recently been described, called 
peptide sequence tagging (Mann and Wilm. 1994; Mann. 1995). This uses tandem 
mass spectrometry (MS/MS) to initially determine the mass of peptides, then subject 
them to fragmentation by collision with a gas, and finally determine the mass of 
fragments. The resulting spectra gives information about a peptide's amino acid 
sequence. The fragmentation masses of peptides can rarely be used to assign a complete 
sequence, but it usually allows a short 'sequence tag' of 2 or 3 amino acids to be 
determined. This sequence tag and the original peptide mass is matched by computer 
against a database, providing a likely identity of the peptide and the protein it came from. 
The major drawback for this technique as a mass screening tool is the complcxitv of the 
mass data generated and the high level of expertise required for its interpretation. 
Nevertheless, it represents a useful new protein identification method which greatly 
increases the power of peptide mass fingerprinting protein identification. 

Cross-species protein identification 

Protein sequence databases continue to grow at a rapid rate, vet it is not widely 
appreciated that close to 907c of all information contained in current protein databases 
comes from only 10 species (A. Bairoch. Pers. Comm. ). Fortunately, this information 
can be used to study proteomes of organisms that are poorly defined at the molecular 
level, via 2-D electrophoresis and 'cross-species' protein identification (Cordwell et 
ai. 1995; Wasinger et ai. 1995). This approach allows proteins from reference maps 
of many different species to be identified without the need for the corresponding genes 
to be cloned and sequenced. This is particularly true for 'housekeeping' proteins, such 
as enzymes involved in glycolysis. DNA manipulation and protein manufacture, 
which are highly conserved across species boundaries. Proteins that cannot be 
identified across species boundaries can then become the focus of further protein 
characterisation and DNA sequencing efforts. 
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A) 



Protein APA1„HUMAN 



ASX : 


8 


. 4 


Gly: 


4 


.2 


Tyr : 


2 


.9 


He: 


0 


.0 



Clx: 19.3 

Thr : 4.3 

/.rg: 6.7 

I>eu: 15.5 



Ser: 6.3 

Ala: 8.0 

Val: 5.5 

Phe : 2.5 



His : 


1 


.3 


Pro : 


4 


.2 


Met: 


1 


.3 


Lys : 


8 


.8 



pi Range: no range specified 
Mw Range: no range specified 

The closest SWISS- PROT entries are: 



Rank Score Protein (pi 















1 


0 


APA1 


.HUMAN 


5 


.27 


2 


4 


APA1 


_MACFA 


5 


.43 


3 


12 


APA1 


_RABIT 


5 


.15 


4 


14 


APA1 


_BOVIN 


5 


.36 


5 


14 


APAl. 


_CANFA 


5 


.10 


6 


18 


APA1. 


_MOUSE 


5 


.42 


7 


26 


APA1. 


-PIG 


5 


. 19 


8 


27 


APA1. 


.CHICK 


5 


.26 


9 


37 


DYNA. 


.CHICK 


5 


.44 


10 


39 


APA4. 


.HUMAN 


5 


.18 



B) 



Mw) Description 

28078 AP0LIPOPR0TEIN A-I. 

28005 APOLIPOPROTEIN A-I. 

27836 APOLIPOPROTEIN A-I. 

27549 APOLIPOPROTEIN A-I. 

27467 APOLIPOPROTEIN A-I. 

27922 APOLIPOPROTEIN A-I. 

27598 APOLIPOPROTEIN A-I. 

27966 APOLIPOPROTEIN A-I. 

117742 DYNACTIN , 111 KD ISOFORM. 

43374 APOLIPOPROTEIN A- IV. 



Reagent: Trypsin MW filter: 10% 
Scan using fragment mws of: 

1953 1933 1731 1613 1401 1387 

1301 1283 1252 1235 1231 1215 

1031 896 873 831 813 781 
732 704 

No. of database entries scanned = 72018 



1 


. APA1_HUMAN 


*> 


. APAl.MACFA 


3 


. APAl.PAPHA 


4 


. B41845 


5 


. APA1_CANFA 


6 


. S30947 


7 


. HS2C_PEA 


8 


. S20724 


9 


. HIWI354 


10 


. TRJ2_ECOLI 



APOLIPOPROTEIN A-I (APO-AI) 
APOLIPOPROTEIN A-I (APO-AI) 
APOLIPOPROTEIN A-I (APO-AI) 
orf B - Treponema denticola 
APOLIPOPROTEIN A-I (APO-AI) 



HOMO SAPIENS 
MACACA FASCICULARIS 
PAPIO HAMADRYAS 

- CANIS FAMILIARIS (DOG) 



hypothetical protein 1 - Azotobacter vinelandii 
CHLOROPLAST HEAT SHOCK PROTEIN PRECURSOR. - PISUM SATIVU 
Tropomyosin - African clawed frog 

HIWI354 premature term, at 793 - Human immunodeficiency 
TRAJ PROTEIN. - ESCHERICHIA COLI. 



Figure 6. Theoretical cross-species matching of human apolipoproicin A-I by amino acid composition 
and tryptic peptides. When an unknown protein is analysed, best ranking proteins from both techniques can 
be compared. If the same protein type is observed in both lists, there is high confidence in this being the 
identity of the unknown molecule (Cordwell et ai. 1995). (A) Output of ExPASy server (AppcL Bairoch 
and Hochstrasser. 1 994) where the true amino acid composition of apolipoprotein A-I was matched aeainsl 
all entries in the SWISS-PROT database, without pi or MW windows. Seven of the top 10 matching 
proteins were apolipoprotein A-I of different species. (B) Output of MOWSE peptide mass fingerprinting 
program f Pappin. Hojrup and Bleasby . 1 993 ) where true tryptic peptides of human apolipoprotein A-I were 
matched against the OWL database, using MW window of 1 09( . Four of the top ten matching proteins were 
apolipoprotein A-I from different species. 
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Rapid cross-species identification of proteins f rom 2-D reference maps can be 
undertaken with amino acid composition or peptide mass fingerprinting methods 
{Figure 6). but these f echniques alone ma> not identify proteins unambiguously when 
phylogenetic cross-sptdes distances are g- eat or analysis data is of poor quality ( Yates 
et aL. 1993; Shaw. 1993; Cordwell et aL. 1995). However, very high confidence in 
protein identities can be achieved when lists of best-matching proteins generated bv 
both techniques are compared (Cordwell et aL, 1995; Wasinger et aL. 1995). The 
correct identification is found when the same protein is ranked highly in lists of best 
matches generated by both techniques. This method has allowed approximated 120 
proteins from the reference map of the mollicute Spiwplasma mellifenun. represent- 
ing approximately one quarter of the proteome. to be confidently identified bv 
reference to protein information from other species ^S. Cordwell. Personal Communi- 
cation). When cross-species protein identification is to be undertaken, it should be 
noted that the molecular weight of a protein type across species is usually hichlv 
conserved, but that protein pi can vary by more than 2 units (Cordwell et aL. 1995). 
Accurate molecular weight determination by direct mass spectrometry of proteins 
blotted to PVDF (Eckerskorn er aL, 1992) should therefore be a useful additional 
parameter for cross-species protein identification. 

CHARACTERISATION OF POST-TRANSLATION AL MODIFICATIONS 

Many proteins are modified after translation. Such post-translational modifications, 
including glycosylation. phosphorylation, and sulfation (see Table 6), are usually 
necessary for protein function or stability. Some abnormal modifications are associ- 
ated with disease (Duthel and Revol, 1993; Ghosh et aL, 1993; Yamashita et aL. 
1993). In proteome studies, post-translational modifications can be examined on all 
proteins present, or on individual spots. Studies on all proteins provide an indication 
of which proteins may carry a certain type of modification. For example. 2-D gel 
analysis of cell cultures grown in the presence of pH] mannose or ["P] phosphate 
gives an indication of which proteins carry glycans containing mannose, and which 
proteins are phosphoryiated {Garrels and Franza. 1989). Lectin binding studies of 2-D 
gels blotted to PVDF or nitrocellulose provide information on the saccharides, if any, 
that are carried by proteins present (Gravel et aL, 1994). 

When individual proteins of interest carrying post-translational modifications have 
been found, micropreparative 2-D electrophoresis can be used to purify them in 
microgram quantities (Hanash et aL, 1991; Bjellqvist et aL. 1993b). If protein 
isoforms of similar MW and pi are to be studied, focusing with narrow range pi 
gradients (1 pH unit) can provide greater separation and resolution. After electro- 
phoresis, the type and degree of protein phosphorylation can be investigated (Murthy 
and IqbaL 1991; Gold et aL. 1994). monosaccharide composition can be determined 
( Weitzhandler et aL. 1993; Packer et aL. 1995). and the structure and exact site of 
glycoamino acids can be investigated by either Edman degradation based techniques 
or by mass spectrometry (Pisano et aL. 1993; Huberty et aL. 1993; Carr, Huddleston 
and Bean. 1993). With further development of rapid techniques, investigation of 
phosphorylation and monosaccharides by chromatographic or mass spectrometric 
means is likely to become a routine step in the characterisation of post-translational 
modifications of proteins from reference maps. 
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The states of proteome projects 

Many technical aspects of proteome research have already been discussed in this 
review, but an overview of the status of proteome projects has not yet been presented. 
Advances in proteome projects will initially rely on progress in genome sequencing 
initiatives, to enable an identity, amino acid sequence, or function to be assigned to 
each proiein spot. Table 7 shows genome size, proteome size, and -:he number of 
proteins already defined for a number of model organisms. This indicates that whilst 
genome sequencing programs for E. coli and S. cerevisiae are advanced, the massive 
size of :.ome other genomes (and especially the human genome) means that their 
complete nucleotide sequences are unlikely to be available for many vears. Because of 
this, 2-D ieference maps and proteome projects of single cell organisms like Myco- 
plasma sp.. E. coli and S. cerevisiae will be the most detailed (Cordwell et «/., 1995; 
Wasinger et n/., 1995; Vanbogelen et al. 1992; Garrels et ai t 1994). and complete 
maps of other organisms will take longer to construct. However, the use of cross- 
species protein identification techniques will allow proteomes of many prokarvotes 
and simple eukaryotes to be partially defined in reference to E. coli and 5. cerevisiae. 

Table 7: Estimated genome size, estimated proteome size, number of protein sequences in SWISS- 
PROT Release 3! (March. 1995). and approximate number of proteins of known identity on 2-D 
reference maps for some model organisms. Genome size data from Smith ( 1994), and total protein data 
from Bird (1995). Genome sequencing projects of £. coli and S. cerevisiae will probably be complete in 
1996. 



Species Name 


Haploid 


Estimated 


Protein 


Proteins 




genomesSize 


proteome size 


entries in 


annotated on 




(million bp) 


(total proteins) 


SWISS PROT 


2-D Maps 


Mycoplasma species 


0.6-0.8 


400-600 


1(H) 


> 100 


Escherichia coli 


4.8 


4000 


3 1 70 


> 3(K) 


Sacchammyces cerevisiae 


13.5 


6000 


3 1 60 


> 100 


Dictyostelium discoidettm 


70 


1 2500 


204 




A rahidopsis thai i ana 


70 


1 40(H) 


270 




Caenorhahditis elegans 


80 


17800 


703 




Homo sapiens 


2900 


60000-80000 


3326 


> 10(H) 



The study of vertebrate proteomes and vertebrate development is a phenomenal 
undertaking in comparison to the investigation of single cell organisms. This is 
because vast numbers of proteins are developmentally expressed, each body tissue has 
hundreds of unique proteins, and there are numerous tissue types. However, it is 
estimated that at least 35% of proteins in vertebrate cells will be conserved from tissue 
to tissue, constituting the 'housekeeping' proteins (Bird, 1 995 ). with the remainder of 
proteins constituting a set that are specific to a cell type. Providing that standardised 
electrophoretic conditions are used, reference maps from many tissues of one organ- 
ism can be superimposed in gel databases (e.g. Hochstrasser et aL. 1992). This 
accelerates the definition of the 'housekeeping' proteins, as well as sets of proteins that 
are unique to different tissue types. Such studies may, however, be complicated by 
post-translational modifications, which can differ on the same gene product in 
different tissues. Proteins that remain unknown after identification procedures will be 
useful in providing focus for nucleic acid sequencing initiatives. 
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FUTURE DIRECTIONS OF PROTTOMfc PKOJECTS 

This review has described recent advances in the area of proteome research. It has 
illustrated how new developments of oldet techniques (2-D electrophoreses ;»nd amino 
acid analysis ) as well as the applications 01 new technology ( mass spectrometry ) have 
greatly widened the choice of tools the biologist and protein chemist has for the 
separation, identification and analysis of complex mixtures of proteins. This has made 
possible the establishment of detailed reference maps for organisms, which are 
becoming the method of choice for the definition of tissues or whole cells, and the 
investigation of gene expression therein. 

Proteome projects are already impacting on the dogma of molecular biolosv that 
DN A sequence constitutes the definition ot an organism. For example, the proieomes 
of different tissues of a single organism are often significantly different. Similarly, 
cross-species identification of proteins (for example the identification of proteins 
from Candida albicans by comparison with S. cerevisiae) can open up studies on 
organisms that are poorly molecularly defined. As cross-species identification can 
proceed at a pace orders of magnitude faster than a genome project in terms of 
defining the gene and protein complement of organims. the need for the DNA 
sequencing of genomes will be avoided, and emphasis.placed on those found to be 
novel. 

Just as genome sequencing is not an end in itself, neither is an annotated 2-D protein 
reference map of an organism, nor indeed the identification of proteins in a proteome. 
So whilst an immediate aim of proteome projects is to screen proteins in reference 
maps, this will lead to expression studies and characterisation of post-translational 
modifications. The challenge that then needs to be addressed is the investigation of 
structure and function of proteins in a proteome. The magnitude of this is illustrated by 
the fact that over half the open reading frames identified \v\S. cerevisiae chromosome 
III were initially of no known function (Oliver et al., 1992). Structural and functional 
studies will be an undertaking just as formidable as genome studies are now and 
proteome projects are becoming, but will lead to an unimaginably detailed under- 
standing of how living organisms are constructed and how they operate. 
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METHODOLOGY 



Human cellular protein patterns and their link to genome DNA 
sequence data: usefulness of two-dimensional gel 
electrophoresis and microsequencing 

JULIO E. CELIS.V HANNE H. R AS MUSS EN,* HENRIK LEFFERS.' PEDER MADSEN." BENT HONOR E 
BORBALA GESSER.* KURT DEJGAARD." JOEL VAN DE KERC K H 0\ " E T 

•Institute of Medical Biochemist? and Human Genome Research Centre. .4arhus University, DK-8000 Aamus. Denmark and 1 Laboratonum 
ioor Fyswlogische Chemic, Rijksuniversiteit Gent, Belgium 



ABSTRACT Analysis of cellular protein patterns by 
computer-aided 2-dimensional gel electrophoresis together 
with recent advances in protein sequence analysis have 
made possible the establishment of comprehensive 
2-dimcnsional gel protein databases that may link pro- 
tein and DNA information and that offer a global ap- 
proach to the study of the cell. Using the integrated ap- 
proach offered by 2-dimensional gel protein databases it 
is now possible to reveal phenotype specific protein (or 
proteins), to microsequence them, to search for homology 
with previously identified proteins, to clone the cDNAs, 
to assign partial protein sequence to genes for which the 
full DNA sequence and the chromosome location is 
known, and to study the regulatory properties and func- 
tion of groups of proteins that are coordinately expressed 
in a given biological process. Human 2-dimensional gel 
protein databases are becoming increasingly important in 
view of the concerted effort to map and sequence the en- 
tire genome. Celis, J. E.; Rasmussen, H. H.; Leffers, 

H.; Madsen, P.; Honore, B.; Gesser, B.; Dejgaard, K.; 
Vandekerckhove, J. Human cellular protein patterns and 
their link to genome DNA sequence data: usefulness of 
two-dimensional gel electrophoresis and microsequencine. 
FASEB J. 5: 2200-2208; 1991. 

Key Words: human protein patterns • 2-dimensional gel protein 
databases • gene expression * microsequencing • cDNA cloning 
* linking protein and DNA information • genome mapping and se- 
quencing 



Proteins synthesized from information contained in the 
DXA orchesrrate most cellular functions. The total number 
of proteins synthesized by a typical human cell is unknown 
although current estimates range from 3000 to 6000. Of 
these, as many as 70% may perform household functions 
and are expected to be shared by all cell types irrespective of 
their origin. There are many different cell types in the hu- 
man body with perhaps 30,000 to 50,000 proteins expressed 
in the organism as a whole judged from the fact that about 
3 7c of the haploid genome correspond to genes. Todav only 
a small fraction of the total set of proteins has been identified, 
and little is known about the protein patterns of individual 
cell types or their variation under physiological and abnor- 
mal conditions. 

For the past 15 years, high resolution 2-dimensional gel 
electrophoresis has been the technique of choice to deter- 
mine the protein composition of a given cell type and for 
monitoring changes in gene activity through quantitative 
and qualitative analysis of the thousands of proteins that or- 
chestrate various cellular functions (refs 1-6 and references 



therein). The technique originally described by OTarrell t 
separates proteins in terms of their isoelectric point (pi) an-., 
molecular weight. Usually one chooses a condition of in- 
terest and the cell reveals the global protein behavioral 
response as all detected proteins can be analyzed both 
qualitatively and quantitatively in relation to each other. At 
present, most available 2-dimensional gel techniques (regu- 
lar gel format) can resolve between 1000 and 2000 proteins 
from a given mammalian cell type, a number that cor- 
responds to about 2 million base pairs of coded DNA. Le>* 
abundant proteins can be detected by analyzing partial! 
purified cellular fractions. 

Two-dimensional gel ectrophoresis has been widely applied 
to analysis of cellular protein patterns from bacteria to mam- 
malian cells (refs 1-6. and references therein). In spite of 
much work, however, information gathered from these 
studies has not reached the scientific community in its full- 
ness because of lack of standardized gel systems and the lack 
of means for storing and communicating protein informa- 
tion. Only recently, because of the development of appropri- 
ate computer software (7-13). has it been possible to scar 
gels, assign numbers to individual proteins, and store tht 
wealth of information in quantitative and qualitative com- 
prehensive 2-dimensional gel protein databases (4, 14-23). 
i.e., those containing information about the various proper- 
ties (physical, chemical, biological, biochemical, physiologi- 
cal, genetic, immunological, architectural, etc.) of all the 
proteins that can be detected in a given cell type. Such in- 
tegrated 2-dimensional gel protein databases offer an easy 
and standardized medium in which to store and communi- 
cate protein information and provide a unique framework in 
which to focus a multidisciplinary approach to study the cell. 
Once a protein is identified in the database, all of the infor- 
mation accumulated can be easily retrieved and made availa- 
ble to the researcher. In the long run, protein databases are 
expected to foster a wide variety of biological information 
that may be instrumental to researchers working in many 
areas of biology — among others, cancer and oncogene 
studies, differentiation, development, drug development and 
testing, genetic variation, and diagnosis of genetic and clini- 
cal diseases (Fig. 1). 

The approach using systematic 2-dimensional gel protein 
analysis has recently gained a new dimension with the ad- 
vent of techniques to microsequence major proteins recorded 
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Figure 1. Interface between partial protein sequence databases- 
comprehensive 2-dimensiona] gel databases, and the human ee- 
nome sequencing project. Appropriate software is required to com- 
pare protein and DNA sequences. In general, although the infer- 
ence of a protein's sequence from the DNA sequence (thick arrow) 
is direct and unambiguous, the DNA sequence can only be inferred 
approximately from the protein sequence (thin arrow ) and cloning 
>f the gene requires either a cDNA or the requisite group of 
jligonucleotide probes deduced from the partial amino acid se- 
quence. Modified from ref 6. 



in the databases (refs 24-42 and references therein). Partial 
protein sequences can be used to search for protein identitv 
as well as to prepare specific DNA probes for cloning as-yet- 
uncharacterized proteins (Fig. 1). As these sequences can be 
stored in the database (see for example Fig. 2H), thev offer 
i unique opportunity to link information on proteins with 
he existing or forthcoming DNA sequence data on the hu- 
man genome (Fig. 1) (20. 36. 39). 

Using the integrated approach offered by comprehensive 
2-dimensional gel databases (Fig. 1), it will be possible to 
identify phenotype-specific proteins; microsequence them 
and store the information in the database: search for homol- 
ogy with previously characterized proteins; clone the 
cDNAs. assign partial protein sequences to genes for which 
the full DNA sequence and the chromosome location are 
known, and study the regulatory properties and function of 
groups of proteins (pathways., organelles, etc.) that are coor- 
dinately expressed in a given biological process. Comprehen- 
sive 2-dimensional gel protein databases will depict an in- 
tegrated picture of the expression levels and properties of the 
thousands of protein components of organelles, pathways, 
and cytoskeletal systems in both physiological and abnormal 
conditions and are expected to lead to identification of new- 
regulatory networks in different cell types and organisms. In 
the future, 2-dimensional gel protein databases may be 
linked to each other as well as to national and international 
specialized databanks on nucleic acid and protein sequences, 
protein structures, NMR experimental data, complex carbo- 
hydrates, etc. 

A few 2-dimensional gel protein databases that are accessible 
in a computer form have been published in extenso: these 
correspond to the protein-gene database of Escherichia coli 
K-12 developed by Neidhardt and colleagues (14, 23), the rat 
REF 52 database established by Garrels and co-workers at 
Cold Spring Harbor (18. 22). and a few human databases 
(transformed amnion cells [15. 20]. normal embryonal lung 
MRC-5 fibroblasts [17. 21]. keratinocytes [19] and peripheral 
blood mononuclear cells [15]) developed in Aarhus. Given 
space limitations and to keep this review in focus, we will 
concentrate on the computerized analysis of human cellular 
2-dimensional gel patterns, and in particular on the steps in- 
volved in establishing comprehensive 2-dimensional gel 
databases that can link protein and DNA information. 



MAKING AND MANAGING A COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASE OF HUMAN 
CELLULAR PROTEINS 

The first step in making a comprehensive 2-dimensionai cc: 
protein database is to prepare a svnthetic image i digital for::: 
of the gel image) of the gel (fiuorogram. Coomassie blue or sil- 
ver stained gel) to be used as a standard or master reference. 
This can be done with laser scanners, charge couple device 
(CCD) 2 array scanners, television cameras, rotating drum 
scanners, and multiwire chambers ^13). Computerized anal- 
ysis systems for spot detection, quantitation, pattern match- 
ing, and data handling (access and retrieval of information, 
database making) have been described in the literature 
(ELSIE [43], GELLAB [11]. HERMeS [44]. MELANIE 
[10]. QUEST (9), and TYCHO [8]) and some are available 
commercially (PDQUEST. Protein Database Inc.. Hunting- 
ton, NY.; KEPLER, Large Scale Biology. Rockvillc. Md.; 
Visage, Biolmage Corporation. Ann Arbor, Mich.: Gemini. 
Joyce Loebl, Gateshead; Microscan 1000. Teehnologv 
Resources Inc., Nashville. Tenn. and MasterScan, Billerica. 
Mass.). Unfortunately, most of these systems are incompati- 
ble with one another and their advantages and disadvantages 
have been discussed by Miller (13). 

In our work station in Aarhus. fiuorogram s arc scanned 
with a Molecular Dynamics laser scanner and the data are 
analyzed using the PDQUEST II software (Protein Data- 
bases Inc.) (12) running on a spark station computer 4100 
FC-8-P3 from SUN Microsystems. Inc. The scanner meas- 
ures intensity in the range of 0-2.0 absorbance. A typical 
scan of a 17 x 17 cm fiuorogram takes about 2 min. Steps 
in image analysis include; initial smoothing, background 
substraction, final smoothing, spot detection, and fitting of 
ideal Gaussian distribution to spot centers. Spot intensitv is 
calculated as the integration of a fitted Gaussian. If calibra- 
tion strips containing individual segments of a known 
amount of radioactivity are used, it is possible to merge mul- 
tiple exposures of the sample image into a single data image 
of greater dynamic range. Once the synthetic image is 
created it can be stored on disk and displayed directly on the 
monitor. Functions that can be used to edit the images in- 
clude: cancel (for example, to erase scratches that may have 
been interpreted as spots by the computer; cancel streaks or 
low dpm spots), combine (sometimes a spot mav be resolved 
into several closely packed spots), restore, uncombine, and 
add spot to the gel. The process is time consuming — about 
1-1/2 day per image. Edited standard images can be matched 
to other synthetic images. Figure 2A shows a portion of a 
standard synthetic image (IEF) of a fiuorogram of 
[ 35 S]methionine labeled cellular proteins from human AMA 
cells (master database) (20). Images can be displayed either 
in black and white (resembling the original Huorograms) or 
in color (other images in Fig. 2), depending on the need. As 
shown in Fig. 25, each polypeptide is assigned a number by 
the computer, which facilitates the entry and retrieval of 
qualitative and quantitative information for any given spot 
in the gel (20). The standard image can be matched auto- 
matically by the computer to other standard or reference gels 
(Fig. 2C. matching of AMA cellular proteins [left] to MRC-5 
proteins [right]) provided a few landmark spots are given 
manually as reference (indicated with a + in Fig. 2C) to in- 
itiate the process. 



Abbreviations: CCD, charge couple device: PCNA. proliferat- 
ing cell nuclear antigen; HPLC. high performance liquid chromatog- 
raphy. 
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Figure. A) Synthetic image of a fraction of an IEF gel of the master image nf AM A cellular proteins. B) As m A but showing numbers 
a.sHjned to each spot. Q Comparison of AMA (left) and normal human embrvonal lum; MRC-5 fibroblasts i nght ) IEF proteins patterns 
Matched proteins are indicated by a + or by the same letters in both gels. Once a protein is matched, information contained in the various 
categories available in the master AMA database can be transferred. D) Svnthetic image of a fraction of an IEF Huorogram of ["Slmcthio- 
mne labeled proteins from normal human MRC-5 fibroblasts. The histograms show levels of svnthesis of a few proteins in MRC-5 (left 
ban and SV 40 transformed MRC-5 (right bar) fibroblasts. £) Polypeptides that contain information under the catcgorv glvcolvtic pathwav 
t\ ine iunction peruse annotation for spot allows the operator to inquire about categories and information available for a given protein 
1 £? V d f bundanCe of cytoskeletal and cytoskeletal-related proteins in quiescent, proliferating, and SV40-transformed MRC-5 fibrob- 
lasts. H) Iolypeptides that contain information under the category partial amino acid sequences. 



2202 Vol. 5 May 1991 



The FASEB Journal 



CELIS ET AL 



- j- + +• ++ :'. + + + + 

+ J+ + #FAFVCW^0^(51i-61) . -pAAEiprGS?FDLDYDFQ(R) (100-117). Bauw et al.,Proc.Nai 

+ •' + . , "*"+"+ + 

t: ■; + +> + + ■ . + + i + 

..+ -■■;-■+.* + + ■ + + . . 



+ ' + , + '• + + + + + 



+ + 'H""*" ' 1 . ^ L7TDGDKAFYDFLSDEIKEE. EV(S)FQ<S)TGER. 

+ + ^VYEEEYGSSLEDDWG (i26^t43). + GTVTDFPGFDER(6-18) . YLTEI1ASR (108-117). 

+ l + :+ + 4. ,?YNHIK. 7FCDLR. IQADGLV?G$<S)K. Molt-4 

+ 4. + . , "tjfSEKEDKYEEEIK(177-189)"f EENVGllHQTLDQTLNELNX (2 

. + 4- + , + * -h_u 

+ + ^OISVAYK (43-50). VFYLK (121-125) . Homologous to protc 

"f" + T + 

- +IAFu r EAIAELDTL (S) EE (S) (199-226) t GIVDQSQQAYQ(R) . YDDMA 
■■•+/ • 4. ' . ^ , QTF?EAMA?L?TL(S)E. E^TL?IA3&? (E) (E) GGE?PQEP 



**tiflB887-«td (USB) 



H 



The automatic matching process that has been described 
in detail by Garrels et al. (12) takes about 5 min. Matched 
proteins are indicated with the same letters in both gels (Fig. 
2C). The usefulness of this function is emphasized by the fact 
that data accumulated on common household* proteins can 
be easily transferred to any other human cellular cell type 
whose 2-dimensional gel cellular protein pattern is matched 



to our standard AMA 2-dimensional gel protein image. Al- 
ternatively, if the standard gel is part of a matchset (set of 
gels in a given experiment) it can be used as a linker gel to 
compare, for example, the quantitative values of a given pro- 
tein throughout the experiment (see Fig. 2D; levels of some 
proteins in normal and SV40 transformed human MRC-5 
fibroblasts) or with other standard images in different sets of 
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cross-matched experiments (18, 22). 

Once a standard map of a given protein sample is made, 
one can enter qualitative annotations to make a reference 
database. Our master 2-dimensional gel database of trans- 
formed human amnion cell (AMA) proteins (20) lists 3430 
polypeptides of which 2592 correspond to cellular compo- 
nents, having pi's ranging from 4 to 13 and molecular 
weights between 8.5 and 230 kDa. The most abundant pro- 
teins in the database correspond to total actin (3.87% of total 
protein; about 90 million molecules per cell) while the 
lesser abundant of the recorded polypeptides are present in 
the vicinity of 5000 molecules per cell. Some annotation 
categories we are using to establish the master AMA data- 
base include: 1) protein identification (comigration with 
purified proteins, 2-dimensional immunoblotting, microse- 
quencing); 2) amounts (total amounts and levels of synthe- 
sis); 3) subcellular localization (nuclear, cytoskeletal, mem- 
brane, membrane receptors, specific organelles, etc.); 4) 
antibodies; 5) posttranslational modifications (phosphoryla- 
tion, glycosylation, methylation etc.); 6) microsequencing; 7) 
cell cycle specificity (specific variations in levels of synthesis 
and amount); 8) regulatory behavior (effect of hormones, 
growth factors, heat shock, etc.) 9) rate of synthesis in nor- 
mal and transformed cells (proliferation sensitive proteins, 
cell cycle specific proteins, oncogenes, components of the 
pathway (or pathways) that control cell proliferation); 10) 
function (mainly from comigration with proteins of known 
function); 11) sets of proteins that are coordinately regulated 
(hierarchy of controls, differential gene expression in various 
cells, etc.); 12) cDNAs (cloned cDNAs); 13) proteins that are 
specific to a given disease (systematic comparison of protein 
patterns of fibroblast proteins from healthy and diseased in- 
dividuals); 14) expression and exploitation of transfected 
cDNAs; 15) pathways (metabolic, others); 16) gene localization 
(genetic and physical); 17) effect of microinjected antibody 
on patterns of protein synthesis; and 18) secreted proteins. 

Information entered for any spot in a given annotation 
category can be easily retrieved by asking the computer to 
display the information on the color screen. For example, 
Fig. 2E shows a synthetic image of a NEPHGE gel (master 
AMA database) displaying the information contained under 
the entry glycolytic pathway. Alternatively, one can use the 
function peruse annotations for spot to directly ask the com- 
puter to list all the entries available for a particular protein. 
By clicking the mouse in a given entry (in this case, presence 
in fetal human tissues) it is possible to take a quick look at 
the information in that particular entry (Fig. 2F). 

A major obstacle encountered in building comprehensive 
2-dimensional gel protein databases is identifying the large 
number of proteins separated by this technology. In our 
databases (20, 21), known proteins are identified by one or 
a combination of the following procedures: /) comigration 
with known proteins, 2) 2-dimensional gel immunoblotting 
using specific antibodies, and 3) microsequencing of 
Coomassie Brillant Blue stained human proteins recovered 
from dried 2-dimensional gels (see next section). Protein 
identification by means of microsequencing may be difficult, 
as individual protein members of families with short peptide 
differences may escape detection. In the gene-protein data- 
base of E. coli K-12 (14, 23), another major 2-dimensional gel 
database available at present, proteins are being identified by 
a wider range of tests that include comigration with purified 
proteins; genetic criterion (deletion, insertion, frameshift, 
nonsense, missense, regulatory), plasmid-bearing strains 
and in vitro synthesis of protein; selective labeling (methyla- 
tion, phosphorylation); peptide map similarity; and physio- 
logical criterion and selective derivatization. 



So far we have received nearly 550 antibodies from labora- 
tories all over the world and these are being systematically 
tested by 2-dimensional gel immunoblotting^ for antigen de- 
termination. Similarly, purified proteins and organelles 
provided by several laboratories have greatlv aided identifica- 
tion of unknown proteins (20721). We routinely request anti- 
bodies and protein samples and promise the donors to make 
available all the information we may have accumulated on that 
particular protein. For example, Table 1 lists entries availa- 
ble for Lipocortin V (IEF SSP 8216), also known as annexin 
V, VAC-o; endonexin II, renocortin, chromobindin-5\ an- 
ticoagulant protein, PAP-I, ycalcimedin, ibC. calphobindin, 
and anchorin CII. 

As mentioned previously, one distinct advantage of 
2-dimensional gel electrophoresis is the possibility of study- 
ing quantitative variations in cellular protein patterns that 
may lead to identification of groups of proteins that are ex- 
pressed coordinately during a given biological process. 
Quantitation, however, is not an easy task as reflected by the 
lack of published data on global cellular protein patterns. We 
believe this is partly due to difficulties in obtaining sets of 
gels that are suitable for computer analysis (streaking, 
material remaining at the origin, etc.) as well as to limita- 
tions (laborious editing time, need of calibration strips to 
merge images, limited dynamic range, etc.) in the computer 
analysis systems available at the moment. Perhaps the most 
advanced quantitative studies published so far using com- 
puter analysis have been carried out by Garrels and co- 
workers (18, 22). In particular, these investigators have estab- 
lished a quantitative rat protein database (18, 22) designed 
to study growth control (proliferation, growth inhibitors, and 
stimulation) and transformation in well-defined groups of 
cell lines obtained by transformation of rat REF52 cells with 
SV40, adenovirus, and the Kirsten murine sarcoma virus. 
These studies have revealed clusters of proteins induced or 
repressed during growth to confluence as well as groups of 
transformation-sensitive proteins that respond in a differen- 
tial fashion to transformation by DNA and RNA viruses. A 
most interesting feature of this quantitative database is the 
discovery of a group of coregulated proteins that show simi- 
lar expression patterns as the cell cycle-regulated DNA repli- 
cation protein known as proliferating cell nuclear antieen 
(PCNA)/cyclin (45). 

In our human databases, most quantitations have been 
carried out by estimating the radioactivity contained in the 
polypeptides by direct counting of the gel pieces in a scintil- 
lation counter (20, 21). Up to 700 proteins can be cut out 
through appropriate exposed films in a period of time com- 
parable to that required for editing a synthetic image. 
Manual quantitation of this large number of spots is difficult 
without the assistance of a master reference image and a 
numbering system that can be used to identify the spots. Us- 
ing this approach, we have recorded quantitative changes in 
the relative abundance of 592 [ 35 S]methionine-labeled pro- 
teins synthesized by quiescent, proliferating, and SV40 
transformed human embryonic lung MRC-5 fibroblasts (21). 
Some data concerning cytoskeletal and cytoskeletal-related 
proteins are presented in Fig. 2G. Our studies as well as 
those of Garrels and co-workers (18, 22) may in the long run 
help define patterns of gene expression that are characteristic 
of the transformed state. 

OTHER 2-DIMENSIONAL GEL PROTEIN 
DATABASES 

As mentioned previously there are other 2-dimensional gel 
databases available in computer form that have been pub- 
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TABLE I. Some entries for lipocortin V in the human AM A 2 -dimensional gel protein database 



Kniries for iipocortin V (IEF SSP 82l6i 
1 . Protein name 



2. Percentage of total protein 

3. Apparent molecular weight (mr) 

4. Isoelectric point (pi) 

5. Method {or methods) of identification 

6. Credit to investigators that aided in 

identification 

7. Antibody against protein 

8. Comigration with human proteins 

9. Cellular localization 

10. Calcium/phospholipid-depcndcnt 

membrane proteins 

1 1 . Function 



Information entered 



Lipocortin \\ renocortin. chromobindin-5'. endonexin I. anticoagulant protein. 
PAP-I. VAC-a, 35-7-caicimedin. IBC. caiphobindin I. anehorin CII. annexin \' 

0.1107c (about 2.800.000 molecules per celh 

33.3 kDa 
4.76 

Microsequencing. 2-dimensional immunoblotting. Comitrration 

G. Bauw. J. Vandekerckhove. and colleagues. Rijksuniversiteii Gent; B. Prpinskv. 
BIOGEN. Cambridge; N.G. Ahn. University of Washington 

Polyclonal (rabbit, antibody no. 20). B. Pepinsky. BIOGEN, Cambridge 
Lipocortin V.N.G. Ahn. Howard Hughes Medical Institute. Washington l*imvr>it\ 
Subcortical membrane 
Lipocortin V 

Regulation of various aspects of inflammation, immune response, blood coairulamm 
and differentiation 



12. Partial amino acid sequence 



13. cDXA sequence 



14. Levels in fetal human tissues 



la. Levels in quiescent, proliferating, and 
transformed MRC-5 fibroblasts 

ti. Distribution in Triton supernatant and 
cvtoskelctons 



GTVTDFPGFDER (7-18). VLTEIIASR (109-117). QVYEEEYGSSLEODYVG 
(127-143). ^GTDEEKFITIFGT(R) (187-201) 

Known. R. Blake et al.,y. Biol. Chem. 263. 10799-10811; 1988 
(pi = 4.76 from translated sequence) 

Adrenal glands = + + + ; brain = + + •+■ ; 

cerebellum * + + + ; ear = + + + : eye = + + + : 

heart = + + + ; hypophysis - + * + ; liver = + + + ; 

lung = 4. + + ; meninges - + + + ; 

mesonephric tissue = + + + ; 

striated muscle = + + + : pancreas = + + + ; 

skin - + + + ; spleen = + + + ; stomach = + + + ; 

submandibular gland - + + + ; 

small intestine - + + + ; thymus = + * + ; 

thyroid gland = -*- + +; tongue = + + + ; 

ureter = + + + 

Q (quiescent) = 1 . 1 ; P (proliferating) = 1.0; 
T (S\"40 transformed) = 0.3 

Mainly supernatant 



lished in extenso: these correspond to the E. coli K-12 
protein-gene database (14. 23) and to the rat REF52 data- 
base (18, 22). 

The £. colt K-12 cellular protein-gene database is perhaps 
the most complete of all databases reported so far and even- 
tually it should trace each protein back to its structural gene. 
Information contained in this database includes: gene/pro- 
tein name (protein name. EC number, gene name); 
2-dimensional gel spot designations (x-y coordinates from 
reference gels, alphanumeric designation): genetic informa- 
tion (linkage map location, physical map location, Genebank 
code, sequence reference, location on Kohara clones); bi- 
ochemical information (molecular weight, pi. number of 
residues of each amino acid, mole percent of each amino 
acid, total number of amino acids in a polypeptide), and 
regulatory information (cellular level of protein in different 
media and different temperature, member of re gu ton. mem- 
ber of stimulon). Major advances of this database are en- 
visaged in the future in view of the eminent sequencing of 



the whole E. coli genome as well as the development of im- 
proved methods to express cloned genes. 

The rat REF52 2-dimensional gel protein database lists 
about 1600 proteins that have been recorded using the 
QUEST analysis system (18, 22). Included in this quantita- 
tive database are 1) protein names (cytoskeletal and heat 
shock proteins as well as various nuclear, mitochondrial, and 
cytoplasmic proteins), 2) annotations (subcellular localiza- 
tion, modification, recognition by specific antibodies, 
coprecipitation, NH 2 -terminal sequence, cross-reference to 
protein sequence information and references to the litera- 
ture), 3) protein sets (cytoskeletal proteins, phosphoproteins, 
sets of proteins with PCNA/cyclin-like properties, etc.) and 
4) general quantitative data (protein synthesis during growth 
of normal REF52 cells to confluence and quiescence, and af- 
ter restimulation of growth-inhibited cells). 

In addition to the 2-dimensional gel databases mentioned 
so far there are several smaller cellular databases being es- 
tablished in human (normal human diploid fibroblasts, k m- 



phocytes. leukocytes, leukemic cells) mouse (NIH/3T3 cells, 
T lymphocytes). Aplysia. yeast (Saccharomyces cerevisae), plants 
(wheat, barley, sorghum), and Euglena. Databases of tissue 
protein, (brain, whole mouse, liver) and body fluid proteins 
(plasma proteins, cerebrospinal fluid, urine, and milk) are 
being established in several laboratories. The reader is 
directed to the review by Celis et al. (4) for details and refer- 
ences concerning these databases. 

MICROSEQUENCING HAS ADDED A NEW 
DIMENSION TO COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASES: A DIRECT 
LINK BETWEEN PROTEINS AND GENES 

The development of highly sensitive amino acid gas-phase or 
liquid-phase sequenators (24), together with the establish- 
ment of efficient protein and peptide sample preparation 
methods, has opened the possibility to perform a systematic 
sequence analysis of proteins resolved by 2-dimensional gel 
electrophoresis. Indeed, generated pieces of protein se- 
quences can be used to search for protein identity (compari- 
son with available sequences stored in databanks) as well as 
for preparing specific DNA probes for cloning of as yet un- 
characterized proteins (Fig. 1). In addition, partial protein 
sequences can be stored in 2-dimensional gel databases (for 
example, see Fig. 2H) and offer a unique link between pro- 
teins and genes (Fig. 1). 

In the early 1970s gel electrophoresis was used to purify 
proteins for sequencing purposes (reviewed by Weber and 
Osborn in ref 25). Proteins were recovered by diffusion and 
sequenced by the manual dansyl-Edman degradation at the 
nanomole level. This technique was further refined by using 
electro-elution to recover proteins and by miniaturizing the 
system (26). This method has been used extensively, but 
showed increasing drawbacks (low yields, protein samples 
contaminated by free amino acids, and NH 2 -terminal block- 
ing) as the amounts of handled protein gradually became 
smaller (e.g., at the 10 picomol level). 

Most of the problems referred to above have been 
minimized with the introduction of protein-electroblotting 
procedures (27-32). When proteins are blotted on chemi- 
cally inert membranes, it is possible to sequence the immobi- 
lized proteins directly without additional manipulations. 
Thus, depending on the amount of bound protein and its na- 
ture, this direct sequencing procedure generally yields NH 2 - 
terminal sequences containing 10-40 residues. As such, this 
technique was used to identify, by their NH 2 -terminal se- 
quences, differentially expressed major proteins from total 
cellular extracts separated on 2-dimensional gels. A major 
difficulty encountered in this procedure is the occurrence of 
frequent artefactual blockage of the proteins. Several studies 
suggest that this phenomenon is mainly due to reaction with 
contaminants (particularly unpolymerized acrylamide 
present in the gel) and to a high dilution of the protein (low 
concentration of the protein per unit membrane surface). In 
addition to this primarily technical problem, many proteins 
are blocked in vivo by acylation or by a pyrrolidon carboxylic 
acid cap. 

The problem of partial or complete NH 2 -terminal block- 
age can be circumvented by generating internal amino acid 
sequences. This is achieved by fragmenting the protein 
present in the gel (gel in situ cleavage) or by cleaving it while 
bound to the membrane (membrane in situ cleavage) 
(33-35). In both cases, proteins are either cleaved in a res- 
tricted way (e.g., by limited enzymatic digestion or by using 
restriction chemical cleavage conditions) or fragmented into 
smaller peptides. 



Of the different combinations examined, we had cooci 
results by using exhaustive proteolytic digestion on 
membrane-immobilized proteins. This method has beer- 
described for Ponceau red-stained proteins on nitroceliulox 
blots (34). for Amido-blackrsstained Immobilon-bound pn 
teins. and for fluorescamine-detected proteins on glass rib; 
membranes (35). The proteases used (trypsin, chymotrypsir.. 
or pepsin) cleave at multiple sites, generating small peptide.* 
that elute from the blot into the digestion buffer from which 
they are purified by reversed-phase high performance liquid 
chromatography (HPLC) before being sequenced individu- 
ally. Although each of these manipulations could be expected 
to result in a reduced yield of final sequence information, we 
were surprised that the peptides could be sequenced with 
high efficiency. In our hands, this approach could be rou- 
tinely applied to gel-purified proteins available in amount 
ranging from 5 to 10 fig, and often yielded sequence informa- 
tion covering more than 30^ of the total protein. As 
membrane-immobilized proteins are not homogeneously 
digested, but rather show protease sensitivity next to resis- 
tant regions, the number of peptides generated is much lower 
than expected from the number of potential cleavage sites. 
Consequently. HPLC peptide chromatograms arc less com- 
plex and most peptides can be recovered in pure form. 

As only limited amounts of a protein mixture can be 
loaded on a 2-dimensional gel. proteins of interest are often 
obtained in vields insufficient for the currentlv available se- 
quencing technology. More material can be obtained by en- 
riching for a certain subcellular fraction (purified cell or- 
ganelles) or by exploiting affinity (dyes, metals, drugs, etc) or 
hydrophobic properties of proteins before gel analysis. All of 
the sequencing results accumulated so far in the human pro- 
tein database (20) (a few are shown in Fig. 2H) have been 
obtained from analysis of protein spots collected from 
2-dimensional gels that had been stained with Coomassie 
blue according to standard procedures and dried for storage. 
Proteins are recovered from the collected gel pieces by a 
protein-elution-concentration device, combined with gel 
electrophoresis and electroblotting. Details of this technique 
have been reported in a previous communication (42) and a 
brief outline is given below. 

Combined gel pieces are allowed to swell in gel sample 
buffer (a total volume of 1.5 ml). The gel pieces combined 
with the supernatant are then collected into a large slot made 
in a new gel. The slot is further filled with Sephadex G-10 
equilibrated in gel sample buffer. During consecutive gel 
electrophoresis, most of the electrical current passes on the 
side of the slot instead of passing through the slot. This 
results in both a vertical stacking and horizontal contraction 
of the protein band. With this device the protein is efficiently 
eluted from the gel pieces and concentrated from a large 
volume into a narrow spot. The highly concentrated (about 
5 mm 2 ) protein spot is then electroblotted on PVDF- 
membranes, stained with Amido black, and in situ digested 
with trypsin. The peptides generated during digestion elute 
from the membrane into the supernatant, and can be sepa- 
rated by narrow bore reversed-phase HPLC and collected in- 
dividually for sequence analysis. 

Using this and previous procedures (37, 39, 42), we have 
so far analyzed 70 protein spots collected from 
2-dimensional gels (20, and unpublished observations) (see 
for example Fig. 2H). The sequence information amounts to 
2100 allocated residues corresponding to an average of 30 
residues per protein spot. So far we have made cDNAs of 
many of the unknown proteins that have been microse- 
quenced. and a substantia] number has been cloned and se- 
quenced. All available information indicates that it may be 
possible to obtain partial sequence information from most of 
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the proteins that can be visualized by Coomassie Brillant 
Blue staining. 

Partial protein sequences are stored in the database as dis- 
played in Fig. 2H, and it should be possible in the near fu- 
ture to interface this information with forthcoming DNA se- 
quence data from the human genome project. In the long 
run. as the human genome sequences become available it 
will be possible to assign partial protein sequences to genes 
:or which the full DNA sequence and chromosomal location 
are known (Fig. 1). 

SUMMARY 

The studies presented in this brief review are intended to 
demonstrate the usefulness of computer-aided 2-dimensional 
gel electrophoresis and microsequencing to analyze cellular 
protein patterns, and to link protein and DNA information. 
As more information is gathered worldwide, comprehensive 
databases will depict an integrated picture of the expression 
levels and properties of the thousands of proteins that orches- 
trate most cellular functions. 

Clearlv. databases allow easy access to a large body of data 
and provide an efficient medium to communicate stan- 
dardized protein information. In the future, databases will 
foster a wide variety of biological information that can be 
used to support collaborative research projects in basic and 
applied biology as well as in clinical research (2, 5, 46). Once 
a protein is identified in a particular database all the infor- 
nation gathered on it can be made available to the scientist. 
However, many problems must be solved before protein 
databases become of general use to the scientific community. 
A most urgent one is to promote standardization of the gel 
running conditions so that data produced in a given labora- 
tory may be used worldwide. Surprisingly, the gel running 
technology as it stands today is still a craftmanship art. 

Finally, comprehensive, computerized databases of pro- 
teins, together with recently developed techniques to 
microsequence proteins, offer a new dimension to the study 
of genome organization and function (Fig. 1). In particular, 
human protein databases may become increasingly impor- 
tant in view of the concerted effort to map and sequence the 
entire human genome. This formidable task is expected to 
dominate biological research in the next decades. ^Jj 

We would like to thank S. Himmelstrup Jprgensen for typing the 
manuscript and O. Sonderskov for photography. Work in the 
authors' laboratories was supported by grants from the Danish Bi- 
otechnology Programme, the Danish Canter Foundation, and the 
Commission of the European Communities. 

REFERENCES 

1. O'Farrell. P. H. (1975) High resolution two-dimensional elec- 
trophoresis of proteins. J Biol. Chem. 250. 4007-4021 

2. Special Issue: Two-dimensional gel electrophoresis. Clin. Chem. 
28, 1982 

3. Celis. J. E.. and Bravo. R.. eds. (1984) T^vo- Dimensional Gel Elec- 
trotohorcis of Proteins: Methods and Applications. Academic. New 
York 

4. Celis. J. E.. Madsen. P.. Cesser. B.. Kwee. S., Nielsen. H. V.. 
Rasmussen, H. H., Honore. B;; Letters. H.. Ratz. G. P., Basse, 
B.. Lauridsen, J. B.. and Celis. A. (1989) Protein databases der- 
ived from the analysis of two-dimensional gels. In Advances in 
Electrophoresis (Chrambach, C, ed) VCH. Weinhekn, Germany 

5. Special Issue: Two-dimensional gel electrophoresis in cell biol- 
ogy. { Celis. J. E.. ed) Electrophoresis 11. 1990 



6. Celis. J. E.. Honore. B.. Bauw. G.. and Vandekerckhove. J. 
(1990) Comprehensive computerized 2D gel protein databa>es 
offer a global approach to the studv of the mammalian ce-K. 
Bio Essays 12. 93-98 

7. Garrels. J. I. (1983) Two-dimensional gel electrophoresis ami 
computer analysis of proteins synthesized by cloned cell lines 
Methods Enzymol. 100. 411-423 

8. Anderson. N. L.. Hofmann. J. P.. Gemmel. A., and Tavlor. S. 
(1984) Global approaches to the quantitative analysis of gene- 
expression patterns observed by two-dimensional gel elec- 
trophoresis. Clin. Chem. 30. 2031-2036 

9. Garrels, J. I., Farrar. J. T. and Burwell. C. B. (1984) The Quest 
system for computer-analyzed two-dimensional electrophoresis 
of proteins in Two- Dimensional Gel Electrophoresis of Proteins. 
Methods and Applications (Celis. J. E.. and Bravo. R.. eds) pp. 
37-91. Academic. New York 

10. Vincens. P., and Tarroux. P. (1988) Two-dimensional elec- 
trophoresis computerized processing. Int. J. Bwchem. 20. 
499-509 

11. Appel. R., Hochstrasser. D.. Roch. C. Funk. M.. Muller. A. F.. 
and Pellegrini, C. (1988) Automatic classification of two- 
dimensional gel electrophoresis pictures by heuristic clustering 
analvsis: a step toward machine learning. Electrophoresis 9, 
136-142 

12. Lemkin, P. E, and Lester. E. P. (1989) Database and search 
techniques for two-dimensional gel protein data: a comparison 
of paradigms for exploratory data analysis and prospects tor bi- 
ological modeling. Electrophoresis 10. 122-140 

13. Miller, M. J. (1989) Computer-assisted analysis of two- 
dimensional gel electrophoretograms. Adv. Electrophoresis 3, 
182-217 

14. Phillips, T D., Vaughn. V., Bloch, P. L.. and Neidhardt. F. C. 
(1987) In Eschericia coli and Salmonella typhimurium: Cellular and 
Molecular Biology, Gene- Protein Index of Escherichia coli K-J2, 2 ed. 
(Neidhardt, F. C, Ingraham. J. I., Low, K. B., Magasanik. B., 
Schaechter. M., and Umbargen H. E. ed) pp. 919-966, Ameri- 
can Society for Microbiology, Washington, D.C. 

15. Celis, J. E„ Ratz, G. P., Celis, A., Madsen, P.. Gesser, B., Kwee, 
S.. Madsen, P. S., Nielsen, H. V., Yde, H., Lauridsen, J. B., and 
Basse. B. (1988) Towards establishing comprehensive databases 
of cellular proteins from transformed human epithelial amnion 
cells (AMA) and normal peripheral blood mononuclear cells. 
Leukemia 9, 561-601 

16. Special Issue: Protein databases in two-dimensional electropho- 
resis. (Celis. J. E., ed) Electrophoresis 2. 1989 

17. Celis, J. E.. Ratz. G. P.. Madsen, P.. Gesser. B., Lauridsen. J. B., 
Brogaard-Hansen, K. P.. Kwee, S., Rasmussen. H. H., Nielsen. 
H. V.. Cruger. D., Basse, B., Letters, H., Honore, B.. Miller, 
O., and Celis. A. (1989) Computerized, comprehensive data- 
bases of cellular and secreted proteins from normal human em- 
brvonic lung MRC-5 fibroblasts: identification of transforma- 
tion and/or proliferation sensitive proteins. Electrophoresis 10, 
76-115 

18. Garrels, J. I., and Franza. B. R. (1989) The REF52 protein 
database. Methods of database construction and analysis using 
the Quest system and characterizations of protein patterns from 
proliferating and quiescent REF52 cells. J Biol. Chem. 264, 
5283-5298 

19. Celis. J. E., Cruger. D., Kiil. J., Dejgaard. K., Lauridsen, J. B., 
Ratz, G. R, Basse. B., Celis. A., Rasmussen, H. H., Bauw, G., 
and Vandekerckhove, J. (1990) A two-dimensional gel protein 
database of noncultured total normal human epidermal ker- 
atinocytes: identification of proteins strongly up-regulated in 
psoriatic epidermis. Electrophoresis 11, 242-254 

20. Celis. J. E., Gesser, B., Rasmussen. H. H., Madsen, P., Letters, 
H.. Dejgaard. K., Honore, B.. Olsen, E., Ratz, G., Lauridsen, 
J. B., Basse, B., Mouritzcn, S.. Hellerup. M.. Andersen, A., 
Walbum, E., Celis, A., Bauw, G.. Puype, M., Van Damme, J., 
and Vandekerckhove, J. (1990) Comprehensive two-dimensional 
gel protein databases otter a global approach to the analysis of 
human cells: the transformed amnion cells (AMA) master data- 
base and its link to genome DNA sequence data. Electrophoresis 
12, 989-1071 



21. Celis. J. E.. Dejgaard. K.. Madsen. P.. Letters. H., Gesser. B.. 
Honore. B.. Rasmussen, H. H.. Olsen. E., Lauridsen. J. B.. 
Ratz. C. Mouritzen, S.. Hellerup. M.. Andersen. A.. VValbum. 
E.. Celis. A.. Bauw. G.. Puype. M.. Van Damme, J., and Van- 
dekerckhovc. J. (1990) The MRC-5 human embryonal lung 
fibroblast two-dimensional gel cellular protein database: quan- 
titative identification of polypeptides whose relative abundance 
differs between quiescent, proliferating and SV40 transformed 
cells. Electrophoresis 12, 1072-1113 

22. Garrels. J. L. Franza. B. R., Chang, C. and Latter. G. (1990) 
Quantitative exploration of the REF52 protein database: cluster 
analysis reveals the major protein expression profiles in 
responses to growth regulation, serum stimulation, and viral 
transformation. Electrophoresis 12. 1114-1130 

23. Van Bogelen, R. A., Hutton. M. E.. and Neidhardt, F. C. (1990) 
Gene-protein database of Escherichia coli K-12. 3rd ed. Electropho- 
resis 12. 1131-1166 

24. Hewick. R. M.. Hunkapiller. M. W.. Hood. L. E.. and Dreyer. 
W. J. (1981) A gas-liquid solid phase peptide and protein seque- 
nator. J. Biol. Chem. 256. 7990-7997 

25. Weber, K.. and Osborn. M. (1985) In The Proteins and Sodium 
Dodecyl Sulfate: Molecular Weight Determination on Polyacrylamide Gels 
and Related Procedures (Neurath. H. et aL eds) Vol. 1, pp. 
179-223. Academic, New York 

26. Hunkapiller. M YV.. Lujan, E.. Ostrander. F.. and Hood t L. E. 
f 1983) Isolation of microgram quantities of proteins from polv- 
acrylamide gels for amino acid sequence anaivsis Methods 
Enzymol 91. 227-236 

27. Vandekerckhove. J.. Bauw, G.. Puype. M, Van Damme J., and 
Van Montagu. M. (1985) Protein-blotting on polybrene-coated 
glass -fiber sheets. Eur J. Biochem. 152. 9-19 

28. Acbcrsold, R. H.. Teplow. D. B.. Hood. L. E.. and Kent, S. B. H. 
(1986) Electroblotting onto activated glass. /. Biol Chem 261 
4229-4238 

29. Bauw. G.. De Loose, M., Inzc. D., Van Montagu, M., and Van- 
dekerckhove. J. (1987) Alterations in the phenotype of plant cells 
studied by NH 2 -terminaI amino acid-sequence analysis of pro- 
teins electroblotted from two-dimensional gel-separated total 
extracts. Proc. Natl. Acad. Sci. USA 84. 4806-4810 

30. Matsudaira. P. (1987) Sequence from picomole quantities of 
proteins electroblotted onto polwinvlidene difluoride mem- 
branes. J. Biol, Chem. 262. 10035-10038 

31. Eckerskorn. C Mcwes. W, Goretzki, H.. and Lottspeich, F. 
(1985) A new siliconized-glass fiber as support for protein- 
chemical anaivsis of electroblotted proteins. Eur J. Biochem 
176. 509-519 

32. Moos. M., Jr.. Nguyen. N. V.. and Liu. T.-V. (1988) Reproduci- 
ble high yield sequencing of proteins electrophorcticallv sepa- 
rated and transferred to an inert support. / Biol Chem 263 
6005-6008 

33 Kennedy. T. E.. Gawinowicz. M. A.. Barzilai. A.. Kandel. E. R., 
and Sweatt. J. D. (1988) Sequencing of proteins from two- 
dimensional gels by using in situ digestion and transfer of pep- 
tides to polyvinylidene difluoride membranes: application to pro- 
tein associated with sensitization in Aplysia. Proc. Natl Acad Sci 
ISA 85, 7008-7012 



34. Acbersold. R. H.. Leavitt. J., Saavedra. R. A.. Hood. L. E.. .tm 
Kent. S. B. H. (I987;internal amino acid sequence analvsi> J 
protein separated by one- or two-dimensional gel electrophore- 
sis after in situ protease dieestion on nitrocellulose Pro : Y * 
Acad Sci, USA 84. 6970-6974. 

35. Bauw. G.: Van Den Bulcke. M.. Van Damme. J.. Puvpe. M 
\an Montagu. M.. and \ andekerckhove, J. (1988) Protein eici- 
trobiotting on polvbase-coated glassfiber and polvvinviidim- 
difluoride membranes: an evaluation. /. Proi Chem 7 l«4-l% 

36. Celis. J. E.. Ratz. G. P.. Madsen. P. Gesser. B.. Lauridsen. 
J- B.. Leffers. H.. Rasmussen. H. H.. Nielsen. H. V, Cruder 
D.. Basse. B.. Honore. B.. M oiler. O.. Celis, A., Vandekerck- 
hove. J.. Bauw. G.. Van Damme. J.. Puvpe. M.. and Van Den 
Bulcke. M. (1989) Comprehensive, human cellular protein 
databases and their implication for the studv of genome organi- 
zation and function. FEBS Let:. 244. 247-254 " 

37. Bauw. G.. Van Damme. J.. Puype. M., Vandekerckhove. 1 
Gesser. B.. Lauridsen, J. B.. Ratz. G. P.. and Celis. J. E. (1989, 
Protein-eiectroblotting and -microsequencing strategies in 
generating protein databases from two-dimensional eels Pwc 
Natl. Acad. Sci. USA 86. 7701-7705 

38. Aebersold. R.. and Leavitt. J. (1990) Sequence anaivsis of pro- 
teins separated by polyacrylamide gel electrophoresis. Towards 
an integrated protein database. Electrophoresis 11. 517-527 

39. Bauw. G.. Rasmussen. H. H.. Van'Den Bulcke. M., Van 
Damme. J.. Puvpe. M.. Gesser. B.. Celis, J. E., and Van- 
dekerckhove. J. (1990) Two-dimensional gel electrophoresis, 
protein electroblotting and microsequencing: a direct link be- 
tween proteins and genes. Electrophoresis 11, 528-536 

40. Tcmpst. P., Link. A. J.. Riviere. L. R., Fleming. M., and Eli- 
cone, C. (1990) Internal sequence anaivsis of protein separated 
on polyacrylamide gels at the submicrogram level: improved 
methods, applications and gene cloning strategies. Electrophoresis 
11, 537-553 

41. Eckerskorn. C. and Lotspcich. F. (1990) Combination of two- 
dimensional gel electrophoresis with microsequence and amino 
acid composition anaivsis: improvement of speed and sensitivity 
in protein characterization. Electrophoresis 11. 554-561 

42. Rasmussen. H. H.. Van Damme, J.. Bauw, G., Puvpe. M.. 
Gesser, B., Celis, J. E.. and Vandekerckhove. J. (1991) In Method* 
in Protein Sequence Analysis (Jornvall. H., and Hoog, J. O.. cds) 
pp. 103-114, Eighth International Conference on Methods in 
Protein Sequence Anaivsis. Birkhauser Vcrlag, Boston 

43. Olson. A. D.. and Milier. M. J. (1988) Elsie 4: quantitative 
computer analysis of sets ol" two-dimensional gel clectrophorcto- 
grams. Anal. Biochem. 169. 49-70 

44. Vincens. P.. Paris. N.. Pujol. J. L. t Gaboriaud. C, Rabilloud, 
T.. Pennetier, J.. Matherat. P. and Tarroux, P. (1986) 
HERMcS: a second generation approach to the automatic anal- 
ysis ol two-dimensional electrophoresis gels. Part I: Data acqui- 
sition. Electrophoresis 7. 347-356 

45 Celis. J. E.. Madsen, P.. Celis. A.. Nielsen, H. V, and Gesser. 
B. (198/) Cyclin (PCX A. auxiliary protein of DNA polvmcrase- 
6) is a central component of the pathwav(s) leading to DNA 
replication and cell division. FEBS Lett. 220, 1-7 

46. Anderson. N. G.. and Anderson. N. L. (1982) The human pro- 
tein index. Clin. Chem. 28. 739-748 



-08 Vol. 5 Mav 1991 



Thp FA^FP 



n r « i c r t « i 



< Electrophoresis 1993. N. 1045-1053 



Preparation of human tumors for anahsts b\ 2-D electrophoresis 



1045 



Bo Franzen 1 
Stig Linder 3 
Ken Okuzawa : 
Harabumi Kato : 
Gert Auer 1 

'Division of Tumor Pathology, 
Department of Pathology, Division 
of Experimental Oncology, 
Karolinska Hospital and Institute, 
Stockholm Sweden 
Tokyo Medical College, Department 
of Surgery, Tokyo 

'Division of Experimental Oncology, 
Karolinska Hospital and Institute, 
Stockholm 



Nonenzymatic extraction of cells from clinical tumor 
material for analysis of gene expression by two- 
dimensional polyacrylamide gel electrophoresis 

We have compared different methods of preparation of malignant cells for 
two-dimensional electrophoresis (2-DE). We found all methods using fresh 
tissue to be superior compared to methods using frozen tissue. Our^esults 
indicate that nonenzymatic methods of preparation of tumor cells, including 
fine needle aspiration, scraping and squeezing, have advantages over methods 
using enzymatic extraction of cells. Nonenzymatic methods~are rapid, appear 
to reduce loss of high molecular protein species, and alleviate the necessity of 
separating viable and nonviable cells by Percoll gradient centrifugation. Usins 
these techniques, high-quality 2-DE maps were W derived from tumors of the 
lung and breast. In the resulting polypeptide patterns, heat shock proteins, 
non-muscle tropomyosins and intermediate filament were identified. We con- 
clude that nonenzymatic extraction of malignant cells from fresh tumor tissue 
improves the possibilities that these techniques may be useful in clinical diag- 
nosis. 



1 Introduction 

Tumors may develop by a number of different mechan- 
isms in any given cell type. At the time of diagnosis, 
tumors will have progressed along different pathways to 
various stages of malignancy. To provide a basis for indi- 
vidual therapy it is of importance to examine specific 
properties of the tumor cell population in each patient. 
A large number of different markers have been de- 
scribed in order to increase the diagnostic accuracy. It is 
likely that a combination of serveral markers is needed 
in the future in order to reflect different properties of 
the tumor. One important method for the resolution of a 
large number of potential markers is two-dimensional 
electrophoresis (2-DE). Extensive efforts are being made 
in identifying various polypeptides separated by 2-DE 
and to characterize how the expression of these polypep- 
tides is affected by the response to cellular transforma- 
tion and various culture conditions [1.2]. It would be of 
value to transfer this information to 2-DE separations of 
polypeptides from tumor tissue samples. However, one 
prerequisite is that the quality of the 2-DE gels from 
tumor samples is comparable in quality with 2-DE gels 
from samples of cultured cells. 

Frozen tumor tissues are commonly used for various bio- 
chemical assessments. However, if such samples are ana- 
lyzed by 2-D polyacrylamide gel electrophoresis (PAGE), 
the polypeptide patterns are obscured by contamination 
of serum- and connective tissue proteins. Such nontu- 
mor-cell-related variations represent serious problems in 
the interpretation and inter-patient comparison of 2-DE 



patterns [3]. 2-DE patterns of cells prepared from fresh 
tumor material were analyzed after enzymatic extraction 
of tumor cells [4, 5] or after culturing tumor fragments in 
medium containing radioactive amino acids [6]. These 
procedures may, however, lead to alterations in the gene 
expression/polypeptide patterns. We are only aware of 
one study where nonenzymatic extraction of cells from 
fresh tumor tissue (prostate cancer) was used to prepare 
samples for 2-D PAGE [4]. We have examined enzymatic 
extraction and various nonenzymatic preparation tech- 
niques, including fine needle aspiration, for the prepara- 
tion of cells from fresh tumor tissues. We describe 
nonenzymatic extraction procedures that are rapid, lead 
to high-quality 2-DE patterns, and that alleviate the 
necessity to purify tumor cell populations from dead 
cells. 

2 Materials and methods 

2.1 Cell cultures and samples used for spot 
identification 

A rat embryonal fibroblast cell line. WT2 (a kind gift 
from Dr. J. I. Garrels and Dr. S. Pattersson) was used for 
the identification of a number of heat shock and struc- 
tural proteins. Human normal diploid lung fibroblasts, 
WI38, human epithelial breast carcinoma cells, MDA- 
231 and MCF-7 were purchased from ATCC and grown 
as recommended. Polypeptides prepared from a leu- 
kemia type pre-B-ALL were separated by 2-DE. The 
2-DE map was then analyzed by Dr. S. M. Hanash (Uni- 
versity of Michigan, Ann Arbor, USA). 
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2.2 Tumor tissues samples 

In this study, 2-DE maps from seven tumors were used 
as representative illustrations: two adenocarcinoma of 
the lung (LA, and LB, mucinous, both cases interme- 
diate grade of differentiation), one sqamous carcinoma 
of the lung (LS), one carcinoid-like breast cancer (BC), 
one microfollicular^ adenoma (highly differentiated) of 
the thyroid (TA), one highly differentiated hyperneph- 
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roma. a tumor of the kidney (KH). and finally one case 
of poorly differentiated corpus carcinoma (CP). 

2.3 Preparation of cultured cells 

The cell monolayers were washed twice in phosphate 
buffered saline (PBS) and then scraped off in ice-cold 
PBS including protease inhibitors (PIH). phenylmethyl- 
sulfonyl fluoride (PMSF) 0.2 mM and 0.83 mM benzami- 
dine pelleted at 660 X g s 3 min (+4°C) and washed one 
lime before final centrifugaiion at 2700 X g. 5 min. The 
wet weight of the cell pellet was recorded and the cells 
were stored at — 80'C until further processing. 

2.4 Preparation of tumor tissue samples 

2.4.1 General remarks 

Macroscopically representative and non-necrotic tumor 
tissues were selected within 20 min after resection. 
Parallel samples were routinely prepared for cytology. 
The samples were processed as rapidly as possible on ice 
or at +4"C and in the presence of PIH. Cells were 
stained with DiffQuick (Baxter) and usually examined at 
three different occasions during the preparation proce- 
dure: (i) cytology sample, (ii) extracted cells and (iii) 
cells after percoll gradient centrifugaiion. 

2.4.2 Specimen acquisition 

The strategy of sample preparation is shown in Fig. 1. 
Tumor tissue cell samples were usually obtained by fine 
needle aspiration (NA) using a 0.7 mm needle. The 
syringe was filled with 1—2 mL of ice-cold culture med- 
ium/PlH. We found that if a tumor appeared to be very 
fibrous it is difficult to extract enough cells for 2-DE 
analysis. In these cases, two alternative techniques were 
examined, (i) The tumor was cut in the middle and the 
fresh surface scraped (SO by a scalpel. The cell-rich 
material was then transferred to ice-cold culture 
medium (L15 with 5% fetal calf serum)/PIH. (ii) A part 
of the tumor sample was placed in culture medium on 
ice for further processing at the laboratory in the fol- 
lowing way: the material was cut into very small frag- 
ments on a pre-cooled dissection plate and transferred 
to a small glass chamber with a 0.7 mm metal net 5 mm 
above the bottom of the chamber. Medium /PIH was 
added to cover the sample (8 mL) which was gently 
squeezed (SQ) towards the net in order to release and 
wash out cells. NA and SC were also compared with an 
enzymatic extraction (EE) procedure described previ- 
ously [5]: Briefly, thin slices of tissue were incubated 
with collagenase (1 mg/mL) and elastase (2 mg/mL) in 
medium for 1 h at 37°C. Extracted cells from every 
sample were then subjected to percoll gradient centrifu- 
gaiion (Section 3.2.3). 

2.4.3 Separation of cells by Percoll gradient 
centrifugation 

The cell suspension was filtered through two nylon mesh 
filters, (i) 250 \im and (ii) 100 \im and then centrifuged 



at 660 X g for 3 min. The ceil pellet was re suspended 
carefullv in medium, usim: a svrince and loaded omc 
two-step discontinuous Percoll/PBS gradient. 20A 
(density = 1.03 g/mL) and 54 7l v (densin = 1.0" g/mLi. 
and centrifuced ai 1000 X <j tor 15 min. In this svstem. 
dead cells stay on ihe top. viable cells sediment to the 
interphase and erythrocytes sediment to the bottom. The 
viability of cells in ihe lop fraction and interphase was 
checked by the trypan blue exclusion test. The inter- 
phase cell layer (> 90°i> viability) was collected and 
washed one time in a lame volume PBS/PI H (centri- 
fuged at 800 X g for 3 min). Finally, ihe cells were resus- 
pended in 1.4 mL PBS and pelleted at 2700 X «.• for 5 
min. The wet weight (WW) was recorded and the pellet 
was then stored at — SO'C. 

2.4.4 Final preparation of cells for 2-1) PAGE analysis 

From this point, cultured cell samples were treated 
in the same way as tumor cell samples: Each cell pellet 
was thawed on ice and resuspended in 1.S9 \xL mQ water 
per mg WW (= LS l ) X WW) \iL. The suspension was 
frozen and thawed 4-5 X to break the cells [7]. A 
volume of (0.08 1 ) X WW) ul_ lO'V sodium dodecyl 
sulfate (SDS). including 33.3 U «' mercaptoethanol. was 
mixed with the sample and incubated 5 min on ice with 
(0.329 X WW) M L of a solution of DNase I (0.144 
mg/mL 20 mM Tris-llCl with 2 mM CAC1 : X 211,0. pH 
8.8) and RNase A (0.0718 mg/mL Tris) [8.9]. The sample 
was frozen and lyophilized. Sample bulTer [10] including 



CONTROL OF TUMOR SAMPLE 

REPWCSCNTATlVirv 




Figure I. Experimental How chan showing main steps of the prepara- 
tion procedures. The abbreviations used for nonenzymalic extraction 
procedures are: FZ: frozen sample preparation: NA. needle aspira- 
tion; SC. scraped: and SO- squeezed sample. Extracted cells arc then 
loaded as a suspension (top volume of each tube) onto either 
1.07 g/mL Percoll (left), or a discontinuous Percoll gradient from the 
nonenzymatic extraction (middle), or from enzymatic extraction 
(right). Cellular top- and interphase fractions are then used for 2-DE. 
For details see Section 2. 
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PMSF (0.2 m.M, EDTA (1.0 m\i). 0.5°o Nonidet P-40 
(NP-40). and 3-[3-cholamido propyD-dimethylammonio]- 
1-propane sulfonate (CHAPS; 25 m.M) was "added care- 
fully, mixed for 2.5 h and centrifuged for 15 min at 



10000 rpm to remove any insoluble material. Dupiicau 
or triplicate samples were taken for protein deiermina 
tion [11]. Samples were stored at -80"C prior to isoeiec 
trie focusing (IEF). 
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2.4.5 Preparation of frozen tumor tissue 

The technique has been described previously [3.12]. 
Briefly, the sample is moaned frozen to a fine powder, 
homogenized, lyophilized and solubilized in sample 
buffer. 

2.4.6 Control of representative 

The tumors were examined routinely by experienced 
pathologists and smears or imprints from the samples 
were also assessed for cytometric DNA content by 
microspectrophotometry. 

2.5 2-D PAGE 

2-D PAGE was performed as described [8.10] except for 
the following details. The glass tubes for IEF, 1.2 X 200 
mm, contained 2.0% Resolyte, pH 4-8 (BDH) and were 
cast to a height of 180 mm. A stock solution of aery I - 
amide fServa) and A r .A"-methylenebisacrylamide (167:1 
for IEF and 37.5:1 for the second dimension) was deio- 
nized by mixing with 5% w/v Duolite MB 5313 mixed- 
resin ion exchanger (BDH) for 30 min. filtered (with a 
0.22 \xm nitrocellulose filter) and stored at — 70°C. 
A.A"-Methvienebisacrylamide, A T ,A\A\N'-tetramethyleth- 
ylenediamine (TEMED) and ammonium persulfate were 
purchased from Bio-Rad. IEF tubes were prefocused at 
200 V in 60 min. To each tube a sample corresponding to 
20-40 ug protein was applied and focused for 14.5 h at 
800 V and finally 1.0 h at 1000 V using a Protean II cell 
(Bio-Rad) and Model 1000/500 Power Supply (Bio-Rad). 
The tube gels were finally extruded into 1.25 mL equili- 
bration buffer, containing 60 mM Tris, pH 6.8 (2% SDS, 
100 mM dithiothreitol and 10% glycerol), frozen on dry 
ice and stored at -70°C. The second dimension (1.0 X 
180 X 90 mm) of the acrylamide concentration was 10% 



T. and the gel coniained 3 T t> m\i Tris. pH S.S. and u. '! 
SDS. IEF gels were applied on top of the slab geL seated 
with 0.5% agarose containing electrophoresis running 
buffer (60 m.M Tris-base. 0.2 m glycine and 0.K- SDS~> 
and electrophoresed with 10-11 mA per gel (constant 
current) at +10 l, C Six gels were run together in a Pro 
tean II xi 2-D Multi-Cell (Bio-Rad). Proteins were visual 
ized by silver staining and photographed with the acidic 
side to the left [13.14]. 

2.6 Identification of polypeptides 

Vimentin and vimentin-derived polypeptides were identi- 
fied by extraction of an MDA-231 cell lysate with O.h m 
KCl/0.5% NP-40 [15]. Tropomyosins were exctracted 
from MDA-231 and WT3S cell lysates [16]. and cytokera- 
tins were extracted from MDA-231 and MCF-" cell 
lysates [17]. The patterns were compared with published 
maps [19-21]. Proliferating cell nuclear antigen (PCNA) 
was identified by immunoblotting ( PCI 0 mAB. Dako- 
patt) using a semidry system (Multiphor II Nova Blot, 
Pharmacia-LKB Biotechnology AB) and enhanced che- 
moluminescence (ECL) detection (Amcrsham). 

3 Results 

3.1 2— DE of samples prepared from normal and 
tumorigenic cultured cells 

The object of this study was to develop methods for pre- 
paration of 2-DE maps from human tumor tissue which 
have the same high resolution as those obtained from 
cultured cells. Shown in Fig. 2 are high resolution 2-DE 
gels prepared from cultured cells and one leukemia: 
SV40 transformed embryonal rat fibroblasts WT2 (Fig. 
2a); human MDA-231 breast carcinoma cells (Fig. 2b): 
human WI38 fibroblasts (Fig. 2c) and human pre B-ALL 




Figure 3. 2-DE analysis of a case of lung adenocarcinoma (LA). Comparison of 2-DE gel quality between (Ai frozen and <B> fresh (needle 
aspiration) lissue preparation. 
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cells (Fig. 2d). Polypeptides were identified through a 
laboratory exchange of cell samples/2-DE maps and 
through 2-DE analysis of purified proteins (Table 1). 

3.2 Preparation of samples from solid tumors 
3.2.1 Fresh versus frozen tissue 

An adenocarcinoma of the lung (LA) was prepared for 
2-DE by conventional methods using frozen material 
(Fig. 3a). There are several possibilities for the poor reso- 
lution using frozen tissue, including the presence of high 
molecular weight protein aggregates. Filtering extracts 
through 0.1 urn filters (Durapore, Millipore) resulted in 
a slightly improved resolution (not shown). When fresh 
tumor tissue from tumor LA was used for sample prepa- 
ration, using fine needle aspiration to collect the cells, 
the resolution was considerably improved (Fig. 3b). The 
use of fresh tissue resulted in a general increase in reso- 
lution, which was most pronounced in the 50—100 kDa 
molecular mass range. A number of differences in the 
protein profiles of the gels in Figs. 3a and 3b can be ob- 
served, some of which are indicated in the figures. The 
decrease in serum albumin in Fig. 3b is likely to result 
from loss of serum proteins occurring when cells were 
pelleted after aspiration. Other differences, such as the 
decreased level of transformation-sensitive tropomyosins 
(TM1-TM3), may result from enrichment of tumor cells 
in the sample of Fig. 3b. Fine needle aspiration, a well- 
established technique in cytology, extracts mainly tumor 
cells because of decreased intercellular adhesiveness of 
neoplastic cells as compared to normal tissue. Micros- 
copic examination of Diff-Quick-stained extracted cells 
from case LA revealed almost 100% tumor cells, 
whereas the whole tissue extract contained approximate- 
ly 60°o tumor cells. 
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Table 1. Names and abbreviations for identified spots 
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Figure V. 2-DE analysis of a case of breast carcinoma (BC). Comparison of 2-DE quality and some differences in detected spots (arrow 
heads indicate increased intensity and circles or bracket indicate decreased intensity of the same spots) between (A> enigmatically and (B) 
nonenzymaiically t scraped) tissue preparation. 
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3.2.2 Comparison of different methods for preparing 
cells from fresh tumor tissue 

Samples were prepared from breast and lung carcinomas 
using either an enzymatic treatment with collagenase/ 
elastase or using nonenzymatic preparations (Fig. 4). A 
number of differences in the protein profiles were ob- 
served in the resulting 2-DE gels, some of which are 
indicated in Figs. 4a and b. These differences include 
both increases and decreases in spot intensity. These dif- 
ferences may result from degradation of high molecular 
weight polypeptides during enzymatic treatment, in- 
creased solubilization of polypeptides, or may have other 
causes. For many tumors, it was only possible to obtain 



small amounts of material since they were reserved to: 
other examinations. In these cases, samples could be pre- 
pared for 2-DE using either needle aspiration or 
scraping. Figure 5a shows a 2-DE ge! prepared from 
squamous iung carcinoma ( LS ) cells collected by needle 
aspiration and Fig. 5b shows a gel prepared from the 
same tumor by scraping. In this case, a number of differ- 
ences were recorded between the two procedures, some 
of which are arrowed in Fig. 5. Samples obtained from 
other tumors (breast and lung) generally showed fewer 
differences between these two methods of cell sampling 
(not shown). These data show that different nonenzy- 
matic extraction procedures may yield different polypep- 
tide patterns. However, the number of spots w ith a large 
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Figure 5. 2-DE analysis of a case of lung cancer (LS). Comparison of 2-DE gel quality and delected spins (arrow heads ;md circles i between 
(A) aspirated (needle aspiration) and (B) scraped preparations from fresh tissue. 




Figure 6. 2-DE analysis of three other types of tumors, (A) hypernephroma. (B) an adenoma of the thyroid and (Ct corpus cancer, using the 
nonenzymatic preparation technique. Arrowheads and circles indicate some cytosoiic polypeptides. 
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difference in intensity were lower than when a nonenzy- 
matic preparation was compared with an enzymatic pre- 
paration. 

2-DE maps of satisfactory quality were prepared by a 
third procedure. Cells were released from small pieces of 
tumor by squeezing (see Section 2). Some examples of 
this are shown in Fig. 6 where 2-DE maps derived from 
a case of hypernephroma. KH (Fig. 6a), a case of thyroid 
tumor. TA (Fig. 6b) and a case of corpus cancer. CP (Fig. 
6c) can be seen. We conclude that nonenzymatic tech- 
niques are useful for 2-DE analysis of a number of dif- 
ferent tumors. The quality of the resulting gels is com- 



parable to that obtained using cultured cells (compare 
the gels in Fig. 2 with those in Fig. 4. 6 and 7). Which of 
these methods will be optimal will, in our experience, 
depend on the tumor material. For example, very small 
tumors are preferably extracted by squeezing; on the 
other hand, breast cancers (which are often fibrous) 
yield satisfactory samples using scraping. 

3.2.3 Purification of cells on percoll gradients 

We considered the possible advantage of separating 
viable cells from dead cells, erythrocytes, and debris 
using discontinuous Percoll gradients. Cells collected 
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FiMurc ' 2-DE analysis of polypeptides from viable <b and d) and nonviable (a and c) cells of an adenocarcinoma of the lung (LB) 
separated using discontinuous Percoll densny gradient. Nonenzymatic preparation technique (a and bi and enzvmalic preparation 
technique tc and d) arc compared. 
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from the interphase showed a viability of more than 
90% as judged by trypan blue exclusion test. However, it 
as found that the yield of viable cells decreased drama- 
tically if the tissue resection was not immediately pro- 
cessed. To study the effect of lysis of cells during the pre- 
paration procedure. 2-DE maps were prepared from 
nonenzymatically extracted cells of case LB collected 
from the top fraction (nonviable. Fig. 7a) and interphase 
fraction (viable. Fig. 7b). These" 2-DE maps were 
compared with corresponding fractions (nonviable. Fig. 
7c. and viable. Fig. 7d) of enzymatically extracted cells. 
One clear disadvantage of the enzymatic technique was 
that when loss of cell viability occurred during prepara- 
tion, a dramatic loss of high molecular weight polypep- 
tides was observed (Fig. 7c). This was probably due to 
degradation of intracellular proteins. However, nonenzy- 
matic preparations showed fewer differences between 
viable and nonviable cells: The most pronounced altera- 
tion was a decrease of a group of mucine related pro- 
teins (Fig. 7b). We conclude, therefore, that disconti- 
nuous Percoll gradient is necessary after enzymatic 
extraction of cells, but can be omitted from the nonenzy- 
matical tumor sample preparation procedure. 

We used the MDA-231 cell line to study the efTects of 
cell lysis and leakage of cyiosolic polypeptides during 
sample preparation. Remarkably, after 30. 50. 80 and 140 
min of incubation in PBS/PIH at 0"C, no significant 
changes were observed in the 2-DE pattern (not shown). 
Although loss of cell viability may not result in protein 
degradation when cells arc incubated in the presence of 
protease inhibitors, loss of cyiosolic proteins would be 
expected during pelleting of cells. We monitored the loss 
of lactate dehydrogenase (LDH) activity into the super- 
natant during incubation in PBS of MDA-231 and MCF- 
7 breast cancer cells ai 20 n C. In both cases, loss of via- 
bility was paralleled by release of LDH from the cells 
(Fig. 8). After 5 h. 70"» of the MCF-7 cells, but only 30 1, o 
of the MDA-231 cells were dead (not shown). 
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Figure S. The relative release f fraction in supernatant of total* of lac- 
tate dehydrogenase acuivity i LDH ) and ceila viability versus incuba- 
tion time of the mammary carcinoma cell lines MDA-231 and MCF-7 
during incubauon in PBS at 20"C. 



These data indicate the impact of a rapid preparation 
procedure, at low temperature, of fresh tumor samples. 
Experiments have also been performed using oniv 
1.07 g/mL Percoll (Fig. 6c and Fig. 1. left test tube) in 
order to remove erythrocytes. One clear advantage with 
this procedure, which today is routinely utilized, is a 
higher yield of viable cells, probably due to decreased 
sample preparation time. 



4 Discussion 

We describe procedures for sample preparation from 
solid tumors for 2-DE. 2-DE maps could be derived 
from solid tumors which were similar in quality to those 
obtained from cultured cells. Compared to methods 
using frozen material, the resolving power of the 2-DE 
technique is increased, allowing examination of a large 
number of polypeptides from tumors of different malig- 
nancies. Other investigators [12.22) have used samples 
from frozen tumors to derive 2-DE maps. We have previ- 
ously described disadvantages encountered using frozen 
tumor samples including variations in contaminating pro- 
teins between different samples [3]. The methods de- 
scribed here are based on the preparation of cells from 
tumors without enzymatic digestion. The enzymatic step 
could be avoided since malignant cells usually grow as 
solid masses which are not strongly attached to the 
matrix. Furthermore, we found that omitting the enzy- 
matic digestion alleviated the necessity of purifying 
viable tumor cells on Percoll gradients. This was in sharp 
contrast to enzymatically treated samples, where loss of 
viability leads to loss of high molecular weight proteins 
(Fig. 7c). 

At least in the case of lung cancer, viable and nonviable 
cells showed small differences in respect to 2-DE maps. 
Presumably, protease inhibitors penetrate cells and 
inhibit proteolysis. In model experiments, we observed 
leakage of cyiosolic protein (LDH) from the cells in 
parallel to loss of viability. Apparently, however, only a 
limited decrease of the level of low molecular weight 
cyiosolic polypeptides was detected using silver staining 
combined with visual inspection. We have found that 
although some tumors are well suited for the prepara- 
tion procedure described, others are not. In general, 
good results were obtained using tumors of the lung, 
breast, corpus and lymphomas. In contrast, cells from 
thyroid adenomas and hypernephroma showed poor via- 
bility. We were in these cases unable to separate nonvi- 
able cells from viable cells, and we can therefore not 
evaluate the consequence of the loss of viability on 
2-DE patterns, apart from a loss of some low molecular 
weight cyiosolic polypeptides. 

Highly differentiated tumors may show lower viability as 
compared with poorly differentiated tumors (Dr. Farkas 
Vanky, personal communication). A number of samples 
from thyroid tumors were prepared for 2-DE but most 
cases showed poor viability. We believe that special care 
is needed during preparation of generally highly differen- 
tiated tumor groups. The difference between loss of via- 
bility/leakage of LDH of the more differentiated MCF-7 
cells and the less differentiated MDA-231 cells is in line 
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,y.'ith these observations (Fig. 8). A number of potential 
and interesting markers, like tropomyosin isoforms. cyto- 
keratins and heat shock proteins, appear to be insensi- 
tive to loss of viability during the preparation procedure. 
We have to date made numerous observations of altera- 
tions in the expression of these polypeptides in breast 
cancers and lune cancers. 

Another problem that may occur, irrespective of sample 
preparation techniques used, is admixture of lympho- 
cytes. These cases are easily detectable in smears and it 
may therefore be possible to select lymphocyte specific 
spots as "internal markers" for the 2-D PAGE analysis. 
Studies using this approach are in progress. Many of the 
polypeptides identified are structural (Table 1). Since the 
expression of many of these polypeptides are known to 
vary between normal and malignant cells, the possibility 
to determine their expression simultaneously is 
appealing. In the specific case of breast cancer, altera- 
tions in the expression of intermediate filament proteins 
(cytokeratins) are known to occur during tumor progres- 
sion [23]. Other proteins known to be differentially 
expressed between normal cells and transformed cell's 
are tropomyosins. numatrin/B23. heat shock proteins 
and PCNA. To this end. we have observed alterations in 
the expression of cytokeratin 8. hsp 90, and non-muscle 
tropomyosin isoibrm 2 during malignant progression. 
(Okuzawa et <//., in preparation and Franzcn et al. % in pre- 
paration ). 

The method of choice for sample preparation from 
tumor tissues will depend on the properties of the tumor 
material studied. It may be important to use only one 
method when comparing cases within one group, as dif- 
ferences were observed between methods. The advan- 
tages of the nonenzymatic techniques are (i) that it mini- 
mizes contamination with connective tissue, (ii) that 
problems with contamination of serum proteins are 
avoided, and (iii} that separation of viable and dead cells 
is not necessary. Hereby the revolving power of 2-D 
PAGE is maximized for the analysis of human tumors 
and studies on inicr-iumor variations in gene expression 
are facilitated. In addition, the polypeptide patterns ob- 
tained may be more representative for the /'// vivo tumor 
cell since the use of enzymes and incubations have been 
minimized. 
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Large Scale Biology Corporation 

Large Scale Biology Corporation is the leader in the integrated discovery, production 
and application of proteins - the functional units of all biological processes. 

Large Scale Biology Corporation (LSB, Vacaville, CA) and its subsidiary Large Scale 
Proteomics Corp. (LSP, Germantown, MD) are a biotechnology enterprise with the mission of 
accelerating the speed and productivity of the life sciences industry product discovery and 
development programs. Unique among biotechnology companies is LSB's integration of 
technologies to discover, analyze, manufacture and find new applications for proteins - the 
functional units of all biological processes. 

Genomics companies have focused on deciphering genetic information, providing an initial but 
only partial understanding of biological processes. LSB's proprietary protein technologies can 
enable the transformation of genomic information into products such as drug targets, 
therapeutics, diagnostics for drug efficacy and toxicity, and traits for agricultural crops. Large 
Scale Biology has gone beyond the "genomics" realm in its business model and developed 
ways to integrate the discovery of gene function with quantitative protein analysis and protein 
manufacturing. This integration of technology platforms favorably positions LSB as a leading 
provider of valuable content to industry leaders in the fields of diagnostics, therapeutics, 
vaccines and agribusiness. 

LSB was founded in 1987 with the goal of commercializing its proprietary GENEWARE viral 
vector system - a novel technology for gene expression. Using safe RNA viruses to transiently 
express genes in non-recombinant plants, LSB has positioned itself in the industry to provide 
cost-effective manufacturing and purification of diverse protein and peptide products. The 
same technology can be applied to the expression of libraries of foreign genes in an 
automated, high-throughput format to discover the function of genes with unparalleled 
efficiency. The GENEWARE system and associated proprietary technologies form the basis 
for LSB's functional genomics, biomanufacturing and a variety of proprietary products under 
development. 

From its foundation, LSB understood the need to integrate functional genomic and protein 
manufacturing expertise with quantitative protein analysis and informatics to become a 
world-leader in the protein field. In 1999, LSB acquired a privately held pharmaceutical 
proteomics company originally founded in 1985. Large Scale Proteomics Corporation (a wholly 
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owned subsidiary of Large Scale Biology Corporation) is an industry leader in identifying and 
characterizing proteins in all types of biological samples for the discovery and development of 
new and more effective therapies, diagnostics, and agricultural products. 

"Proteomics" is the study of the entire complement of proteins expressed in a cell, tissue, or 
organism. Proteomics can significantly improve drug discovery and development because 
most illness is associated with imbalances among, or malfunctions of, proteins. Only a small 
fraction of diseases can be attributed to the presence of a defective gene. Unlike classical 
genomics approaches that discover genes that may relate to a disease, LSP has developed a 
proprietary system called the ProGEx module for directly characterizing proteins associated 
with disease. Using this same technology, LSP can characterize the effects of candidate drugs 
intended to reverse a disease process, and to determine the degree to which this objective is 
achieved free of adverse side effects. 

LSB and LSP have protected their many discoveries though an extensive portfolio of domestic 
and foreign patents and have developed commercial alliances and partnerships to exploit the 
value of their technologies. LSB and LSP scientists and engineers focus on the development 
and application of resources to help clients meet their objectives as well as the development of 
our own proprietary products for subsequent partnering with industry leaders. 

A combined staff of 140 professionals operates from three locations in the United States, with 
a network of collaborators and affiliates throughout the US and Europe. Company 
headquarters, R&D laboratories and its Genomics division are located in Vacaville, California 
about 60 miles northeast of San Francisco. Process development and biomanufacturing take 
place in Owensboro, Kentucky, and LSB's Large Scale Proteomics Corporation subsidiary is 
located in Germantown, Maryland. 

In August, 2000, LSB completed an initial public offering (IPO) of 5 million shares of common 
stock and now trades on the NASDAQ under the symbol LSBC. 

* 

Leadership - Large Scale Biology Corporation 

Robert L Erwin, Chairman of the Board and Chief Executive Officer, founded LSB™ and has 
served as a director and officer since 1987. Mr. Erwin is the former chairman of the State of 
California Breast Cancer Research Council and currently serves on the University of California 
President's Engineering Advisory Council. He is Chairman of the Supervisory Board of Icon 
Genetics AG. As a co-founder of Sungene Technologies Corp., Mr. Erwin served as Vice 
President of Research and Product Development from 1981 through 1986. He has served on 
the Biotechnology Industry Advisory Board for Iowa State University. Mr. Erwin received his 
M.S. degree in Genetics from Louisiana State University and is an inventor on several LSB 
patents. 

David R. McGee, Ph.D.,a co-founder of LSB and Senior Vice President and Chief Operating 
Officer, has been an officer since 1987. Prior to joining LSB, Dr. McGee was Vice President of 
Operations at Sungene Technologies Corporation from 1983 to 1987. Dr. McGee received his 
Ph.D. in Genetics from Louisiana State University and served as a faculty instructor of zoology 
and genetics at Louisiana State University. 

Laurence K. Grill, Ph.D.,a co-founder of LSB and Senior Vice President, Research and 
Development, has served as an officer since 1987. Dr. Grill was the Manager of Plant 
Molecular Biology for Sandoz Crop Protection Corp. from 1984 to 1987 and Senior Research 
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Scientist in the Department of Molecular Biology at Zoecon Research Institute from 1 980 to 
1984. He received his Ph.D. from the University of California at Riverside with an emphasis on 
the molecular basis for viral gene expression in plants. 

R. Barry Holtz, Ph. D., Senior Vice President, Biopharmaceutical Manufacturing, has served 
the company as an officer since 1989 upon the acquisition of Holtz Bio-Engineering, which 
was founded in 1980. Dr. Holtz was a co-founder and Director of Research for MFI, Inc., the 
largest manufacturer of microencapsulated nutrients for agriculture and Director of 
Fundamental Research at Foremost-McKesson, Inc. Dr. Holtz received his Ph.D. in 
Biochemistry from Pennsylvania State University and served as Assistant Professor in the 
Department of Food Science and Nutrition at Ohio State University. 

Daniel Tuse, Ph.D., has been an officer of LSB since he joined the Company in 1995 as Vice 
President, Pharmaceutical Development. Dr. Tuse manages the company's pharmaceutical 
design and development programs, including LSB's novel vaccines and immunotherapeutics 
initiatives. Prior to joining LSB, Dr. Tuse was Assistant Director of SRI International's (Menlo 
Park, Calif.) Life Sciences Division. In his 17 years at SRI, Dr. Tuse developed extensive R&D 
experience in pharmaceuticals and specialty chemicals, serving an international list of clients. 
Dr. Tuse received his Ph.D. in Microbiology (1980, cum laude) with a minor in Toxicology from 
the University of California, Davis. 

John S. Rakitan, a co-founder of LSB, Senior Vice President & General Counsel and 
Secretary, has served as an officer since 1988. Prior to joining LSB, Mr. Rakitan was an 
attorney in private practice. Mr. Rakitan received his J.D. degree from the University of Notre 
Dame. 

Michael D. Centron, Treasurer, has served as Controller since 1988 and was elected as 
Treasurer in 1991 . Mr. Centron was Audit Supervisor for Varian Associates from June 1985 
through July 1 988, and he also worked for Arthur Young and Co. (currently Ernst & Young). 
Mr. Centron is a certified public accountant and received his M.B.A. degree from the University 
of California at Berkeley. 

Guy della-Cioppa, Ph.D., is an officer of the company and currently serves as Vice President, 
Genomics. Prior to joining the company in 1989, Dr. della-Cioppa worked for Monsanto 
Company in St. Louis, MO from 1984-1989 and was an NIH Postdoctoral Fellow at the 
Worcester Foundation for Experimental Biology in Shrewsbury, MA from 1983-1984. He 
received his Ph.D. in Biology from the University of California, Los Angeles. 

William M. Pfann joined Large Scale Biology in August 2000 as Senior Vice President Finance 
and Chief Financial Officer. Mr. Pfann was formerly with PricewaterhouseCoopers LLP from 
1969 to July 2000, most recently as the Risk Management Partner for the Western Region. He 
served in a number of management roles at PwC, including leader of the firm's Silicon Valley 
audit practice, National Director of the networking and communications sector and Managing 
Partner of the Northern California emerging business group, as well as Partner-in-Charge of 
the Oakland and Walnut Creek, California offices. Mr. Pfann received a B.S. degree from the 
University of California, Berkeley, in Business Administration and an MBA in Accounting from 
Golden Gate University. 
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Leadership - Large Scale Proteomics Corporation 

N. Leigh Anderson, Ph D., Chairman, President and CEO of Large Scale Proteomics 
Corporation (LSP™). Dr. Anderson obtained his B.A. in Physics with honors from Yale and a 
Ph.D. in Molecular Biology from Cambridge University (England) working with M. F. Perutz as 
a Churchill Fellow at the MRC Laboratory of Molecular Biology. Subsequently he co-founded 
the Molecular Anatomy Program at the Argonne National Laboratory (Chicago) where his 
work in the development of 2-dimensional electrophoresis (2-DE) and molecular database 
technology earned him, among other distinctions, the American Association for Clinical 
Chemistry's Young Investigator Award for 1982 and the 1983 Pittsburgh Analytical Chemistry 
Award. In 1985 Dr. Anderson co-founded LSP (originally Large Scale Biology Corp., 
Germantown, MD) in order to pursue commercial development and large-scale applications 
of 2-D electrophoretic protein mapping technology. 

Norman G. Anderson, Ph.D.,Cb\ei Scientist at LSP. Dr. Anderson has a distinguished record 
as an inventor. His career includes senior positions at Oak Ridge and Argonne National 
Laboratories (ORNL and ANL), more than 300 scientific publications, and the receipt of more 
than 20 prestigious awards in recognition of his work in science and technology. For his 
invention of the zonal ultracentrifuge, he received the John Scott Medal Award, and for the 
centrifugal fast analyzer, the Preis Biochemische Analytik fur Klinische Chemie from Die 
Deutsche Gesellschaft fur Klinische Chemie for the most outstanding analytical development 
in clinical chemistry worldwide during a 2-year period. In 1 984 ANL awarded him its career 
patent leader award for the largest number of patents issued to an employee. At that time the 
commercial value of his inventions in terms of U.S. sales and royalties from foreign licensing 
were $250 million and $1 million, respectively. Dr. Anderson received his degrees at Duke 
University: a B.A. in Zoology, M.A. in Physiology, and Ph.D. in Cell Physiology. He holds 28 
patents. 

Constance Sen/7/; Vice President, Operations. Ms. Seniff has managed LSP's operations 
since 1993. Her background includes thirteen years in international business prior to joining 
LSP, five abroad in the employ of foreign firms. Ms. Seniff is responsible for helping 
formulate and implement business development and database commercialization strategies 
for LSP in coordination with the management of LSP's parent company, Large Scale Biology 
Corporation. Ms. Seniff has a B.Sc. degree in Business (with honors) from Florida State 
University. 

Robert J. Walden, Vice President, Finance at LSP. Mr. Walden joined LSP in 1997 and has 
served as a director since 1 999. He previously served as Vice President of Finance and 
Administration at Osiris Therapeutics, Inc., and as Chief Financial Officer at the American 
Type Culture Collection (ATCC). Mr. Walden received his degree in Finance from the 
University of Maryland. 

Jean-Paul Hofmann, Ph.D.,Vice President, Software Development at LSP. Dr. Hofmann is a 
plant geneticist by training, having earned a B.S. in Biology, M.S. in Biochemistry and 
Genetics, and Ph.D. in Plant Genetics from the University of Orsay, Paris. He has extensive 
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experience in using 2-DE in agronomic research and in designing analytical software for 1- 
and 2-D applications. He has held senior scientific positions in industry and research 
institutes, in the U.S., France and the Ivory Coast. 

John Taylor, Ph.D.,V\ce President, Software Development and Bioinformatics. Dr. Taylor is 
the principal developer of Kepler™, LSP's analytical software for automated 2-DE pattern 
analysis. Prior to joining LSB, Dr. Taylor served as computer scientist in the Molecular 
Anatomy Program at Argonne, and on the research staffs of the University of Chicago and 
the Armed Forces Institute of Pathology in Washington, D.C. Dr. Taylor received a B.S. in 
Physics from the University of South Carolina, and a Ph.D. in Nuclear Physics from Duke 
University. 

Sandra Steiner, Ph.D., currently serves as Vice President Proteomics Applications. Prior to 
joining the Company, Dr. Steiner founded and directed the Molecular Toxicology Group at 
Novartis in Basel, Switzerland and was a member in several multi-disciplinary drug 
development project teams. Dr. Steiner received her Ph.D. in Toxicology/Pharmacology from 
the University of Basel, Switzerland. 
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